\ y 


THE AMERICAN STATISTICAL ASSOCIATION 








Se — 
No. 2 





THE ANNALS 


of 


MATHEMATICAL 
STATISTICS 


FOUR DOLLARS PER ANNUM 








THE ANNALS 
of 


MATHEMATICAL 
STATISTICS 


Published and Lithoprinted by 
EDWARDS BROTHERS, INC. 
ANN ARBOR, MICH. 














THE ANNALS OF MATHEMATICAL STATISTICS is 
affiliated with the American Statistical Association and is devoted 
to the theory and application of Mathematical Statistics. 


Published Quarterly: March, June, September, December 


Four Dollars per annum 


H.C. CARVER, Editor 
A. L. O'TOOLE, Associate Editor 


The Annals is not copyrighted: any articles or tables appearing therein may 
be reproduced in whole or in part at any time if accompanied by 
the proper reference to this publication 





Address: ANNALS OF MATHEMATICAL STATISTICS 
Post Office Box 171, Ann Arbor, Michigan 














THE ANNALS OF 


MATHEMATICAL STATISTICS 


VOL. 5 JUNE, i934 NO. 2 








CONTENTS 


On a New Method of Computing Non-Linear Regression 
oS nr ee ee ee ee ee ee a ee 


Walter Andersson 
The Standard Error of Any Analytic Function of a Set of 
Parameters Evaluated by the Method of Least Squares . 107 
Walter A. Hendricks 
Transformation of Non-Normal eee Distributions into 
Normal Distributions .. ‘e-“-= « « «ae 
G. A. Baker 


Invariants and Covariants of Certain Frequency Curves. . 124 


Richmond T. Zoch 


Quadrature of the Normal Curve . . . . .. . 136 
E. R. Enlow 
Editorial: On a Best Value of R in Sample of R from a Finite 
Populationof N . . . . . . . . . 146 
A. L. O’Toole 


Editorial: Punched Card Systems and Statistics . . ~- 153 
H. C, Carver 





a A pe cme ae 


errr rr a oe nee = 2 = sta ee 


1 meee Sastre ss Fea. 


AR REESE A in teat 2 














ON A NEW METHOD OF COMPUTING NON- 
LINEAR REGRESSION CURVES* 


By 


WaLTerR ANDERSSON, 
fil. dr, Stockholm. 


In a memoir published in this journal in February 1930' Pro- 
fessor S. D. Wicksell pointed out that the well-known Pearson 
method? of computing skew regression curves by adopting the 
principle of least squares can be simplified, and in some direction 
generalized, by inserting some assumption concerning the distri- 
bution function of the population studied. After some remarks 
on the subject as advanced in the said memoir the problem was pre- 
sented to me by Professor Wicksell. The results obtained by me as 
regards this problem were published as a part of my doctor thesis.® 
In the course of the official ventilation of my thesis Professor 
Wicksell made some interesting remarks concerning the relations 
between my solution and the general Pearson solution. His sug- 
gestion has led me to take up this special problem, which will be 
considered in the following lines. 

I. We consider a bi-variate distribution and denote the varia- 
bles x and y . The distribution function—for the sake of sim- 

*From the Statistical Institution of the University of Lund, Sweden. 

15. D. Wicksell, Remarks on Regression. 

2 Karl Pearson, On the General Theory of Skew Correlation and Non- 
Linear Regression; Mathematical Contributions to the Theory of Evolution 
/ Drap. Comp. Res. Mem., Biom. Ser. II, 1905. 

8Walter Andersson, Researches into the Theory of Regression, chap- 


ters IV-VI, / Kungl. Fysiografiska Sallskapets Handlingar, N. F. Bd. 43, 
Nr. I; also as Meddelande fran Lunds Observatorium, Ser. IT, Nr. 64 / 


oe 





] 

| 
i 

i 


Se a reg rere SE ee 


ee ee er re ec nee Rae te ne 








82 NON-LINEAR REGRESSION CURVES 
plicity being supposed discontinuous—may be 
(1) Zz F¢ x, y), 
so that 
(2) Z & Pouyel 
Let g@® be the regression function of y onx. Thus 
(3) 7, = gt), 
where Yn denotes the mean value of the dependent variate y for 
a fixed value of the independent variate x . Consequently we have 


> A Y- Fax, Y) 
JM" Say 
gd 
We further observe that the marginal distribution of x is 
(5) for= % Flxy). 


Expanding the regression function in the series of Tcheby- 
cheff we put 


(4) 


(6) gOX)= BOI LODGE BODH, 
where (x) are polynomials of the ¢ - orders, fulfilling the 


following condition of orthogonality 


(7) = $00): W (xc): Y Oc) = O, for tej, 


and 


(8) = fx) [gGd-, wy, Go)—x- Pp (x)---- — %: 4 (=) = Min 


From (7) and (8) it may be shown that the expansion by 
Tchebycheff carried to some order gives the same approximate 
expression for the regression as obtained by fitting a parabola of 
the same order to the mean values of y¥ for every value of x, 


4 Tchebycheff, Collected Works, Vol. I, pp. 203-230. 















WALTER ANDERSSON 83 


each observation being allotted a weight proportional to the num- 
ber of individuals possessing the value of x in question. Thus, 
by using the series of ‘I'chcbycheff in treating the regression prob- 
lem we have as a matter of fact applied the same method of de- 
scribing the regression as applied by Yule* and Pearson.° 

We observe that using the series of ‘I'chebycheff we gain the 
advantage of being able to perform the graduation successively for 
the higher orders. With respect to this circumstance I have used 
the notation successive regression cocfficicnts for the coefficients 
x; of (6). 

Working out the solution for these coefficients we obtain from 


(7) and (8), | 
Z fx): 2.6): 909 
(9) ox. = 


“ Z fx) Leo)’ 


the polynomials \p.(x) being determined from (7). 

The successive regression coefficients—except °, —have been 
shown / see W. Andersson, Op. cit., pp. 14-15 / to be independent 
of the zero-values of the variables, and in some cases they are 
found to stand in simple relations to the well-known semi-invariants 
of Thiele.” Especially when the distribution is assumed to be gen- 
erated according to the hypothesis of elementary errors the semi- 
invariants of Thiele and the successive regression coefficients are 
closely related. In this respect the denomination semt-invariant 
regression coefficients may be suggested for the coefficients «,. The 
values of these coefficients ought to be derived in all more exhaus- 
tive studies of curved regression lines. 

5G. U. Yule, On the Significance of Bravais’ Formulae for Regres- 
sion, &c., in case of Skew Correlation / Proc. Roy. Soc., Vol. 60, pp. 477- 
489, 1897 /. 

® Pearson, Op. cit. 

™T. N. Thiele, Theory of Observations, London 1903, p. 24. / See 


also Annals of Mathematical Statistics, Vol. II, pp. 165-307, where this 
work of Thiele is reprinted /. 





a ee neaserars 2 


= 


a ee 


b 




















NON-LINEAR REGRESSION CURVES 


/ . . . ° 
We introduce the moments, v, A? of the distribution. Taking 
these about any point we have 


x 


(0) yp = ZZ x yt Foxy). 
If we observe that 
h h 
(11) Z fez x g(x)= Z . x" y- Foxy), 


it is immediately seen from (9) that the coefficients a, can be 
expressed as linear functions of the “mixed” moments y/ , Y,, 
uy; up to vy , all other quantities being dependent on the marginal 
moments of x alone. 

This solution may shortly be summed up. For a fuller discus- 
sion I refer to the cited memoir by the writer. 


We write 


; iy é-2 
(12) W (x) = x"+ e.X+Ee, X +---+e x12. 
t Gti eee “,) 


«) 
Let J be the following determinant of the marginal mo- 
ments of x, 
hoy 


~o > 


Y, 
% 


~ 
=e 


() x. Vv, 


(13) a = 


o 


' ‘ 4 
v 
«) * None Lune 2ho0 | 
and A,, be its sub-determinant obtained by cutting out the 


(h+1) th row and the (1+!) th column and multiplying by 


(-! - . Then we a, 
4;; 
(14) e.=- —t , 
“t Dii 
and G-1) 


A ' 
(15) x, = Sr Lure VD ¢---¢+E Vv +e. 


oer ct) cy co oO} 





4o i 













WALTER ANDERSSON 


or, using the “standardized” variables 


: x- 772, ” - Wt, 
(16) SS? re v= ? 








(7 =mean, 6 = dispersion) 
and introducing the coefficients 


(17) Se = e.. -AE 


ét+1,0 


where Ei; stands for the “standardized” moments and 2 is the 
usual Galton coefficient of correlation, we have / W. Andersson, 
Op. cit., p. 16 / 

(-1) 


a 
(18) ~* [ Sec * Sa t.,***** €:2 850]. 


The relations between the successive or the semi-invariant 
regression coefficients o%, and the coefficients of the graduation 
parabolas as written in their usual forms are easily obtained. 
Taking the parabola of the th order 


_ Ge) (#) (~) 2 Co) 
(9) =a+ra L+tax+---+a, x, 


we have, indeed, / Op. cit., p. 17 / 
(fe) 


. * ©? €,% *4,% 7 -*- t €y4 Xp 
(p) 
a = w @, Geese Oy MS 
20 @) a 
(20) a” < » +€y, Tye 
(p) : = 
Oy = “Ya 


The coefficients e.; are the same as those defined by (14). 





2. Starting with the general solution just indicated we may 











86 NON-LINEAR REGRESSION CURVES 


proceed further into the matter. It will be seen that some new 
problems are met with in applying the general method to actual 
statistics. 

Taking account of the fact that the solution only involves the 
moments of the distribution, we can free ourselves from any as- 
sumptions as regards the distribution function itself. The required 
moment values may then be directly computed from the observed 
frequencies. This way of solving the problem leads to the method 
advanced by Pearson in his treatises on this subject. The solution 
evidently gives a least squares graduation to the observed array 
means when the weights of each mean value are proportional to 
the observed frequencies in the corresponding arrays. 

This method may be the most straight-forward one, but it is, 
however, by no means the simplest, nor the most efficient one. 
Considering the fact that the term of the < th order of the parab- 
ola contains moments up to the 2¢ eas order, we immediately 
conclude that the arithmetical work would rise to a considerable 
amount, and, with growing moment order be more and more in 
vain, as a consequence of the rapidly increasing sampling errors 
of the computed moment values. Some other ways to treat the 
problem must be sought for in order to eliminate these difficulties. 

A first outline of a new method was suggested by S. D. Wick- 
sell in the year 1930 / Wicksell, Op. cit. /. Starting with the gen- 
eral solution Wicksell pointed out that some well-known rules of 
Thiele as regards the determination of moments of high orders 
were directly applicable in the computation of high order regression 
parabolas. The rules of Thiele referred to may be formulated in 
the following way / Thiele, Op. cit., p. 24 /: 

To obtain the first semi-invariants, or moments, rely entirely 
on computations. To obtain the intermediate semi-invariants rely 
partly on computations, partly on theoretical considerations. But 
to obtain the higher semi-invariants rely entirely on theoretical con- 
siderations. 








































WALTER ANDERSSON 87 


Professor Wicksell’s suggestion was that instead of the higher 
marginal moments, involved in the least squares expressions for 
the regression coefficients, should be inserted the moments of a 
suitably chosen frequency function (with a limited number of 
parameters), fitted to the marginal distribution of the independent 
variate. 

The method indicated was then more thoroughly studied by 
the writer of these lines / Op. cit. /. The solution obtained along 
these lines was in detail worked out, and it was also tested as re- 
gards its practical usefullness in dealing with actual statistics. 
Especially by use of the Pearson types of frequency functions 
very simple expressions, successfully applicable within a large 
domain of actual statistics, were deduced. 

An important advantage of this method / as well as of the 
Pearson method / of, computing high order regression parabolas 
may be noticed. As the regression coefficients have been expressed 
as functions of the moments only—in the method elaborated by 
the author only of those of low orders—the influence of “group- 
ing” may be accounted for by correcting the computed moment 
values in this respect. For this purpose suitable correction formu- 
las are available, as for instance the well-known ones given by 
Sheppard. Experience has convinced me that at the ends of the 
regression curves, at least, the effect of grouping can displace the 
computed curve in a considerable manner, so that in many cases 
some attention must be paid to these circumstances. 

It is, however, to be remembered that the solution obtained 
by applying Wicksell’s proposition does not give a strict least 
squares graduation to the observed array means, as a consequence 
of the fact that the theoretical values of the high order moments 
always in some degree differ from the directly computed ones. 
From this it is evident that some care must be taken in choosing 
the hypothesis as regards the marginal distribution of the inde- 
pendent variate. It may be remarked, however. that these circum- 











88 NON-LINEAR REGRESSION CURVES 





stances cause very little practical difficulty on account of the much 
refined theory of uni-variate distributions. 

The discrepancy between a least squares solution and the 
solution as obtained by applying the method as advanced by the 
author may, as pointed out to me by Professor Wicksell in the 
course of the official ventilation of my thesis, be removed by an 
adjustment by which the latter solution is turned into a strict 
least squares solution. This problem will be considered in the 
following paragraph, and at the same time we shall get an oppor- 
tunity to study the hypothetical assumptions applied before from 
a somewhat different point of view. 

3. We consider the expression (8). Before we have from 
this condition worked out the general least squares solution in 
assuming {(x) to be the true marginal distribution (5) and glx) 
to be the true regression function and then 1 / the directly com- 
puted moment values were inserted in the general solution / Pear- 
son’s method /, or 2/ the moment values required were deter- 
mined in accordance with the rules indicated in § 2 / method 
elaborated by the writer /. Now we shall directly imply in (8) 
our working hypothesis concerning the marginal distribution of 
x . Let the hypothetical x-marginal distribution function be w(x). 
The solution is then to be deduced from the following condition 


(21) a w(x): | gl) -% BOd-x pOd--— %, y,(9]= rw. 


It is immediately clear that, in this way, we always get a 
strict least squares solution with respect to the distribution func- 
tion w(x) whatever the form of g(x) may be. ; 

In fact, the functions f@) and g(x) are totally independent 
of one another, and, as is seen from (8), the distribution func- 
tion fx) enters into the expansion of Tchebycheff for the re- 
gression function g(x) only as a weight function which deter- 


mines the weights to be allotted to the regression means in grad- 


WALTER ANDERSSON 89 


uating the values of these by means of this series carried to a 
certain order, or, what is the same, by means of a parabola of the 
same order, the coefficients of which are determined according 
to the principle of least squares. Then it is clear that for practical 
purposes it is not necessary to derive the exact form of fx) in 
performing the expansion. (6). The hypothetical distribution 
function «¥(c) would be expected to give a satisfying result as 
soon as “(x) in its main characteristics corresponds with the true 
distribution function $(X). 

I am going to work out the detailed solution for the follow- 
ing two usual forms of W(x): 


A/ Normal Error Function, 
; /2 
w/(§) = ur e 


(22) 


B/ Pearson Type III Function, 


is CAT - 
(23) 


be is the skewness, or 


a 
(24) I= + ae. 


In both cases the expressions for the terms of the series of 
Tchebycheff will be found to be very simple. 

At first considering the polynomials y.(x) we are to have 
in accordance with (7), the distribution function being continuous, 


(5) f dx. WOO Yi Oc): Gr) = 0 (c#y). 














90 NON-LINEAR REGRESSION CURVES 


From this expression it may be concluded that the polyno- 
mials y.(xX) are in case A/ the polynomials of Hermite, and in 
case B/ those of Laguerre. Both these kinds of polynomials are 
of well-known forms, and consequently the values of the e-coeff- 
cients as defined by (12) may easily be derived from propositions 
about these polynomials. 

For the successive coefficients we have according to (9) the 
following expression 


J dx- w(x). Y (x): g(x) 
(26) or = 


t 





“i & 
J dx: w(x): [Ly.cx)] 


Taking account of (13), (14), and (15) and introducing the 
notation 


ed 
h 
(27) = fdx- wx) x" gex), 
( ~ooO 
we obtain 
aos 
— D+e&.dD te. p+e DV 
(28) or. = 7s [ z, + Coy i.* +e Vtre. I 
or, introducing the corresponding “standardized” moments e 
and putting 
(29) Sa *, "70 Eso 
t 
we ge _@y) 
om A. fe - _ es 
- «+ 4 S * ese » a ha + ©, S504 


—(i-1) 
The coefficients <a and é, j are determined by (14) and 
(15) when the moment values 





WALTER ANDERSSON 
1 yp fa ‘)*) . 
(31) eh pax wt) xX, 


-o29 


respectively 


(32) a Sde . w(é)- & 


are inserted in the determinants. 

In the cases here considered we get very simple expressions 
for these determinants. We have, indeed, / W. Andersson, Op. 
cit., pp. 88 and 123 / in case of normal distribution 

ae on 
oO “ar q 


and in case of Pearson type III distribution 
_ 1) 
9 Bes gee . 
A il Tr Cth S*) 
The values of the o-coiicents necessary for the computa- 
tion of the terms of the series of Tchebycheff up to the fifth order 
are given in the one on 


lype IZ . 
as 

6(6S-1) 

45 (6S-1) 


- 3(6 S-1) 


zoS 
10(1zS-1) 
208(12S-5) 
5 (24S 46543) 
~3S(as-5) 








NON-LINEAR REGRESSION CURVES 


In order to derive the expressions for the computation of the 
moment quantities 4. , or ‘. , we denote the class-breadth by @ 
and the observed mean value of y inthe ~ array of x by a. 
The values of y are then given by the following formula 


— hy ox 
am RZ] R,, 
where 
x,+ 2 
(36) ‘. = f dx- wx). 
” x-@2@ 
Se 


The computation is easily performed as soon as the function 


(37) @ (x) = ST de. w Cx) 


is known. In either case we have access to suitable tables of this 
function. For the Pearson type III function the “Tables for the 
incomplete I. -function, edited by Karl Pearson” are to be used. 


4. We will now make some general remarks concerning the 
relations between the different methods of computing regression 
parabolas touched upon in the preceding lines. We start with the 
general condition (8) for the determination of the coefficients: 


2 
< foo [g (x) -% WY (x) - y ere, 4 co] M in. 


It is seen that the expansion is determined by the marginal 
distribution function f@&) and the regression function gcy. If 
f$G@) and g(x) are not the true functions of the population but 
the functions corresponding to the actual sample, the solution 
will give the sampling values of the coefficients. This is the solu- 
tion advanced by Pearson, and consequently in his method no 
graduation of the data is performed in order to smooth out the 





= 

















WALTER ANDERSSON 93 


influence of sampling irregularities on the values of the coeffi- 
cients. Without any further considerations it is clear that meth- 
ods which include an adjustment of the data in this respect are 
desirable. The problem is analogous with that occurring in the 
general theory of distributions. Among other facts of great im- 
portance that speak in favour of using mathematical functions 
for the description of distributions one is that we in this way are 
able to eliminate in some degree the accidental irregularities. 
When the regression is described by the series of Tchebycheff the 
smoothing process is evidently performed firstly by graduating 
the regression means by a parabola, and secondly by adjusting 
the parabola coefficients for the accidental irregularities. This 
latter adjustment has been accounted for by the two methods 
treated by the author. When using the rules of § 2 as principle 
for this adjustment the smoothing process is applied to the mo- 
ment values involved in the general solution for the coefficients, 
and in the methods indicated in the preceding paragraphs we have 
used a weight function which is to be considered as a graduation 
of the observed marginal distribution of the independent variate. 

As mentioned before we do not get a strict least squares solu- 
tion when applying the rules of § 2. This is, however, of little 
practical importance, but it remains to see in what manner this 
solution is to be modified in order to become a least squares grad- 
uation of the observed array means. 

When applying the rules of § 2 the product moments are 
computed from the following expression 


ee = 
(38) ‘ek a ee 
yr NV f Xo RR 7h, ? 
where WV is the total number of observations and 72,_ the number 
of observations in the a array of x . We suppose that the 


graduation is to be based on directly computed moment values 





{ 
i 
ie 


a 
b 
i 
4 
: 





















94 NON-LINEAR REGRESSION CURVES 


up to the ad order, A usually not being greater than six, jin 
accordance with the rules of Thiele. The values “ the marginal 
moments of the independent watene up to the h*” order indicate 
the distribution function fo a 1a ep which function is chosen 
as the theoretical distribution function determining the values of 
the marginal moments of orders above the >. A strict least 
squares solution with respect to the distribution function foc 7) 
may be worked out according to the formulas given in this mem- 
oir by taking WwW, = f(x %,45,°", vy.) In this case we have for the 
product moments the following values 


(39) e« Zl <s 


hi # . a pR Dp? 


where 


nilE] 


‘ 


Xp 
(40) I= jf. ax S$ Mary Re) 
YZ 


Ya 





Subtracting (39) from (40) we get 












' _ x 
(41) Ay = Z (LW 2). x" a I: 





which consequently are the corrections to be added to the directly 
computed values of the product moments of the solution worked 
out in accordance with the rules of § 2, in order to obtain a strict 
least squares solution. 

These corrections are easily computed as soon as the integrals 
Zz x, are determined. This task, however, would in some cases be 
somewhat arduous. If the general Pearson theory of frequency 


is applied we must sometimes resort to mechanical quadrature 
formulas. 


WALTER ANDERSSON 95 


a remark concerning the correction of the grouping of the mo- 
ments ,, . According to the method of computing these char- 
acteristics we may regard them as mixed moments of a distribu- 
tion having as its *-marginal distribution the function wc), the 
regression means being the observed ones. Thus we evidently 
can apply the usual methods of correcting computed moment val- 
ues for the effect of grouping. By using the formulas of Shep- 
pard we have to observe, however, that the moments involved in 
these formulas must be referred to the supposed semi-theoretical 
distribution. 


5. Numerical Illustrations. In order to illustrate the appli- 
cation to observed data of the consideration above I have numeri- 
cally treated a few populations—representative ones in that they 
are examples of correlation distributions of different degrees of 
skewness. 

Example I. Case of slightly skew correlation. Pearson’s 
example B. Example 1:2 and I1:2 of the cited memoir of the 
author. Population :Correlation between age and height of head 
in 2272 girls. 

/4 = age; y= height of head / 


X= (25 yes. X-%=+.2007 DWe=) £2.3263X-.0655 


Yj='2525mm G-y,=--60IT Gy=2 4 =.2895y'+.1742. 


As regards the moment values I refer to the memoir of Pear- 


son. These indicate that the marginal distribution of x may ap- 
proximately be represented by the normal curve. Thus I take for 
wy, the normal function. 


We have to calculate the product moments v, , 4,, and v, ; 

These computations may be performed by using the follow- 
ing scheme. The different values are derived from the correlation 
table given in Pearson’s memoir. 


The values of £ correspond to the class ranges. 


















































NON-LINEAR REGRESSION CURVES 


3645.000 
2121.216 
1333.927 
664.200 
309.250 
115.712 
47.601 
9.736 
1.054 
0.000 
- 0.194 
1.856 
12.231 
41.088 
104.000 
191.160 
738.822 
- 365.568 
455.625 


0.000 





WALTER ANDERSSON 
We get 


2 
Ei =. 


eLxG, 9879 
—ee ee 


x 


= 6.6564, =EX He x"F — 75,5197 
and from these values 
7 =3.1123, WY =-2.2376, ¥Y, =79.2110. 
Sheppard’s corrections for grouping have been applied. 
For the corresponding standardized moments we obtain the 
following values: 
E,=0.291, E =-0.0689, €& =0.8000. 
This leads to the following values of the @ -coefficients de- 


fined by (29): 
°= —0.0689, g, ‘om ~0.0823. 
The values of the successive regression coefficients then be- 
come 
= 0.2941, © =-0.0345, X, =-0.0127. 


Comparing these different values with the uncorrected ones 


= 0.0000 %.-¢, = + 0.0021 
= — 0.0086 , Sao = 0.0337 
=+ 0.0511 - x, =-+ 0.0000 
= — 0.0365 4 - & =-+0.0010 
= + 0.2894 3 - %3 =— 0.0056 
We especially observe that the adjustments of the © -coeffi- 
cients are smaller than those of the moments of the same orders. 
The adjusted coefficients result in the following regression 
parabola of the third order: 


7]. = +0.0MS + 0.3352 § — 0.0345 — 0.01378. 


The curve is drawn on diagram 1. For the sake of compari- 
son the graph of the Pearson curve and that obtained by applying 


the rules of Thiele. the marginal being the normal curve, are given 
on the same diagram. 











98 NON-LINEAR REGRESSION CURVES 


Example II, Case of moderately skew correlation. Popula- 
tion :Correlation between weight of newborn boy and weight of 
placenta; material supplied by the Maternity Hospital of Lund, 
Sweden. Example 2 in S. D. Wicksell: ‘Correlation Function of 
Type A, etc.” /Kungl. Svenska Vetenskapsakademiens handlingar, 
Bd 58, Nr. 3/. N = 1223. 

/x = weight of boy; y = weight of placenta/. 
X,=3950 gt. = W,= 300 X-x,= 4.4685 E=.5940 X'- .2783 


y, = 630g. Dy= 80 ¥-y,= ~.57/5 = -OF54 y+ 3974 , 

The correlation table and the computed moment values are 
given in the said memoir of Wicksell. For w, we take the nor- 
mal function. 


—_— 


Calculating the moments Vv, , ete. in the same manner as used 


in the first example we obtain the values, 


VY, = 1.5540, Y, = 0.3412, . = 11.7653 
which give the following values of the standardized moments: 
z, = 0.6420, € =0.0837, €&,= 1.7153. 
The values are corrected for grouping. 
We further get 


930 = + 0.0837, ‘. = — 0.2106 
and 
%, =+0.6420, * =+0.0419, 7 =—0.0351. 
The values of the adjustments of the different quantities are 
given below: 


é, -2 =-—0.0035 9. - S30 =— 0.0656 
E,, ~ £1 = + 0.0196 $4,- Sao = — 0.0173 
E,,- £3, =— 0.2132 5 - =—0.0035 
E..- E30 = +9.1320 =. -% =—0.0328 
E457 xo = — 9.2870 +, -~, =—00031. 


The correction of °, is rather great, but not greater than was 
to be expected with consideration to the roughness of the fit of the 
hypothetical marginal distribution function. It is clear that when 
applying the solution of my previous paper in this example we 





WALTER ANDERSSON 99 


should use a type 1V curve for the marginal distribution. The 
unadjusted values of the parabola coefficients are also in this 
case easily computed, but the calculation of the adjustments by 
which the solution is turned into a least squares solution would 
be very laborious. 

In order to illustrate the suitability of the several methods | 
have drawn the following curves on diagram 2: 1/ unadjusted 
solution, hypothetical marginal distribution being the normal 
function; 2/ adjusted solution, hypothetical marginal distribution 
being the normal function; 3/ unadjusted solution, marginal dis- 
tribution being Pearson’s type IV function. 


The equation of the second curve is 


2Z2 
Yo, = — 0.0419 + 0.7473 § + 0.0419 § — 0.0351 . 


The third curve is undoubtedly best fitted to the data. 
Example III, Case of extremely skew correlation. The cor- 
relation between the age of bachelor and the age of spinster at 
marriage, Sweden 1911-1920. Example 1:4 and I1:7 in the cited 
memoir of the author. N = 321908. 
/x = age of spinster ; ¥ = age of bachelor/ 
X%,=275y2s = 5 X~x,2 —. 3130 S$ = .95I5 x + .2927 


YR 5y2s Uys 5 GF -J,2 +2624 N= 8506¥ — .2424. 
The moment values as given in the said memoir indicate that 
we can use Pearson’s type III function as hypothetical distribu- 
tion function for the .-marginal distribution. From the moment 
value- as computed in the cited memoir we obtain the following 
values of the constants of this function: 
of =14312 3 =20483. 
It is to be remarked that for our purposes the computation 
of the constant C is not necessary. 
For the c-coefficients we get the following values: 
é,, =—1.3974 e,, = — 4.1922 =— 8.3844 
‘.. — 1.0000 e,, = — 0.0708 2 + 11.5752 
C50 = + 2.7948 = + 11.3771 
— 5.7876. 





NON-LINEAR REGRESSION CURVES 


For the unadjusted values of the © -coefficients and the suc- 
cessive regression coefficients we further get 

930 = +0.1787 O,=+0.5255 $50 = + 2.8763 
ox, = + 0.5535 +, = + 0.05192 


x, =—0.0122669 4 = + 0.003097. 


Computing the corresponding adjusted values by use of 
“Tables of the incomplete [ -function” we obtain 
Go=+0.1723 Fyo =+04638 9,5 = + 2.0851 
& = + 0.5528 + = + 0.05789 
<, =—~0.014647 x, = + 0.001097 
Sheppard’s corrections have been applied in both cases. The 
differences between the adjusted and the unadjusted values are 


-A =-— 0.0007 ~ £2. 0.0000 


30 


2) == — 0.0034 5 7 £40 = — 0.4762 
- €,, =— 0.3237 - £5, = — 2.0405 
- £4, =— 1.4548 — — 0.0007 
= - - 0.0064 0 ~ Sra =—9.0617 
= — 0.7912 - %2 =-+ 0.00597 
- *3; =-— 0.001978 
=, — %y = — 0.002000. 


The parabolas of the third and the fourth orders are the 


My) fn 


x 


ff 
! 
( 


se 0 
) ! 
oe 6 
ay 


following ones: 

Unadjusted values of the coefficients: 

= -.0873 + 4848 +1050 ¢*- .01267&> 

2 -.1053 + SITE + 1409&- ,03864£" + 003097 é” : 

Adjusted values of the coefficients: 

He = --0988 + .4729E + 11938-01405 & 

Te 2 1052 + 4854 € + .1320£*-.02384 E> 4 001097 ¢” ’ 
The graphs are drawn on diagrams 3 and 4. 


% 
% 


The results indicated by the few examples treated in this 
paragraph clearly point out that the Tchebycheff expansion can- 
not be considered as a least squares graduation of the observed 





WALTER ANDERSSON 101 


regression means when the moment values involved in the solu- 
tion are determined in accordance with the rules laid out in § 2. 
As regards the practical applicability of such a solution, how- 
ever, this circumstance is of little importance, because the curve 
in this case is found to give as good, and sometimes a better rep- 
resentation of the regression than a strict least squares graduation. 
Further, as the calculation of the moments of the first few orders 
is often required for other purposes than the determination of 
the regression curve, the computation of the unadjusted solution 
in these cases is arithmetically very simple. Not having access to 
the moment values we may perhaps in some cases consider the 
direct computation of the adjusted solution as performed in ex- 
ample I to be the simplest method. The adjusting of correctly 
determined unadjusted solutions would certainly very seldom be 


of real gain. 


Stockholm, September 1933. 


S. D. WicksELL 
Note on Dr. Andersson’s Paper. 


In an extensive memoir, Researches into the theory of Re- 
gression, Dr. WW. Andersson has worked out a very simple and 
widely applicable numerical method of computing curved regres- 
sions. The general principle on which this method was founded 
Dr. Andersson has kindly attributed to me. It was laid out in 
my paper in the first number of the “Annals” Journal and may 
be stated as follows: After fitting a suitable univariate frequency 
function with a limited number of parameters—e.g. the normal 
curve or one of Pearson's types—to the marginal distribution of 


the independent variate, the moments of this function—which are 


all expressible in terms of the parameters—should be used in 


computing the regression coefficients, instead of the ordinary 














102 NON-LINEAR REGRESSION CURVES 


values (power means). Of course, when, in fitting the curve, the 
ordinary moments of lower orders have been usea in determining 
the parameters, this procedure means that the moments of higher 
orders are theoretically expressed in terms of the moments of 
lower orders instead of being directly computed. 

Applying this device to the ordinary least squares expres- 
sions for the regression coefficients, it was clear that a departure 
from the least square condition took place, but the chances were 
that this would not harm the result, and the computations would 
be much simplified. Dr. Andersson's investigation has shown 
that these expectations were highly justified. 

During the official ventilation of the memoir, which was pre- 
sented as Thesis for the degree of D.Ph., it was agreed that 
the method ought to be tested by a comparison with a theoretically 
very similar method in which the least squares condition was re- 
tained, although theoretical or semi-empirical weights were intro- 
duced instead of the purely empirical weights used in the method 
of Karl Pearson. 

In the present paper Dr. Andersson has taken this question 
up and he shows that whereas the original (unadjusted) method 
is numerically simpler in application, it gives practically just as 
good regression curves as the new, adjusted method. In some 
cases he even considers the unadjusted solution to be the better one. 

By this the incident may seem to be closed. I should, how- 
ever, like to point out, in a few words, how very straightforward 
a principle it is, which lies behind this adjusted method. 

It is simply this: When a correlation table is given, the re- 
gression of y on + will not be affected by multiplying the frequen- 
cies within any xr:array by a constant factor. Hence the follow- : 
ing procedure will not affect the regression of y on +7; i.e.. the 
process of reducing or adjusting the frequencies in the several 
x arrays so that the marginal sums will be equal to the smoothed 


frequencies, corresponding to any mathematical curve which has 


WALTER ANDERSSON 103 


been fitted to the marginal distribution. Thus, on applying Pear- 


son’s ordinary least squares solution to this adjusted table a least 


squares regression parabola would be obtained in which the mar- 


ginal moments where those of the smoothed distribution, and also 
the mixed moments were, although only in a secondary degree, 
affected by the smoothing of the marginal. It is only in this last 
respect, i.e. as regards the mixed moments, that this method devi- 
ates from the one originally proposed. 

In my opinion many curved regressious could be very easily 
and accurately enough computed by simply smoothing the mar- 
ginal of the independent variate with a normal curve or, event- 
ually, a Pearson Type II1 curve, and correspondingly adjusting 
the array frequencies. This method may work well even if the 
deviations of the actual distribution from the smoothed distribu- 


tion are systematical. 


Statistical Institute, University of Lund, November 1933. 











104 NON-LINEAR REGRESSION CURVES 


DIAGRAM 1 


Cubics 


771, 777, 





6 & 10 12 4 TA (8 20 22 yeaRS, 


Unbroken curve: Adjusted solution, the hypothetical marginal dis- 
tribution being the normal curve. 

Dotted curve: Unadjusted solution, the hypothetical marginal dis- 
tribution being the normal curve. 

Dashed curve: Pearson’s curve. 





WALTER ANDERSSON 


D1acGrRaM 2 


Cubics 


Unbroken curve: Adjusted solution, the hypothetical marginal dis- 
tribution being the normal curve. 

Dotted curve: Unadjusted solution, the hypothetical marginal distri- 
bution being the normal curve. 

Dashed curve: Unadjusted solution, the hypothetical marginal distri- 
bution being the Pearson Type IV curve. 








YRS, 





PRET TT TT te 
PT Re 
PTT TT tt ds 
te dd 

Rt tT 
PT Ty Ng | ty 
PTT TT Re te as 


50 GO 


40 





GO YRS. 


50 


$ 


PTT tT Tt Rit Ts 


Quartics 


D1aGRAM 3 
7) 
D1acraM 4 


TCC 


PT 
CCC CCC ¢ 


3s &® + m a 


2ce 





NON-LINEAR REGRESSION CURVES 


50 
40 


9 
3 


YRS. 





8 
N 


Adjusted 


Unadjusted solutions. Dashed curves: 
solutions, the hypothetical marginal distribution being the Pearson Type 


Unbroken curves: 


J 
> 
be 
> 
ve 

— 

_ 

— 











THE STANDARD ERROR OF ANY ANALYTIC 
FUNCTION OF A SET OF PARAMETERS EVAL- 
UATED BY THE METHOD OF LEAST SQUARES 


By 


Wa ter A. HENDRICKS, 


Bureau of Animal Industry, 
U. S. Department of Agriculture, Washington, D. C. 


After fitting a curve to a set of data by the method of least 
squares, it is occasionally necessary to use the res:.iting values of 
some or all of the parameters of the curve in further calculations. 
Since the estimates of the values of the parameters obtained from 
a particular set of data are subject to errors of sampling, it fol- 
lows that the result of any calculation involving those values of 
the parameters will have a certain standard error. Since the 
estimated values of the parameters are not independent of each 
other, the familiar formulas based on the assumption of inde- 
pendence should not be used for the purpose of calculating this 
standard error from the standard errors of the parameters them- 
selves. The correct approach to the problem involves little more 
than an application of the methods presented by Schultz (1930) 
in his excellent paper describing the method of calculating the 
standard error of a particular function of the parameters, viz., 
the same function which was used in evaluating the parameters. 

Let y= P ( A, A,°°° A be an analytic function in- 
volving the kK parameters, A;. This function may not be linear 
with respect to the parameters, so that if the parameters are to be 
evaluated by the method of least squares, we have in the general 
case a function of the form: 


(1) y= P(A,, Rs A+ =z Ad, ++: +28 aX+-- 


K 
o 











108 THE STANDARD ERROR OF PARAMETERS 


from which the values of the parameters may be obtained by as- 
suming approximate values and calculating the corrections which 
must be added to obtain the most probable values. 
After the values of the parameters have been obtained, let 
it be required to find the standard error of a new function, 
GU, .A..- AL), involving those values. If Z is an analytic 


1927 
Seeiiies of the parameters, we have to a close approximation: 


36 
(2) = O(A,, Mo A+ Sa ie a, ad,+-+28 yd: 
Any error inZ, beyond the cnatiaels error bina te the 
above expansion, will then be due only to errors in Ar, »4 A, 
,4 A, 
Therefore, if 
3 » aa alt Me ew 
” {» a Ai, “* 3 Az, ¥* + Sa, 
and  e and De denote the standard errors of 2 and £4 ,» respec- 
tively, it is at once apparent that >. = Sr ‘ 


The values of AA, ,AA,,...,4A,. may be expressed in 
terms of the data from which they were evaluated, 


Ax, 








AA, = GM + GM +----+G Md, 
(4) AA=z Mt GML + Eh 
Ar,= 6M + BM 4.--.48,M 


in which the values, 4, , represent the 72 observed values of the 
variable, 43 f may then be expressed in the form: 


() f-2 [220M + 2h aM r.+ 228] 


From the sdtitans laws a manual of error and the 
fact that ~, = S,, it follows that 


6) S- Z [3% +t med S. 


in which ~~ is the standard error of estimate of y based on 2-K 





WALTER A. HENDRICKS 109 


degrees of freedom. If the right-hand member of equation (6) 
is aa the opntion may be weitten in the — 


5 )iee] + + (53, _ 3¢ )[se}s 
(ax, )Cax, ~Jiede or 28) 5x.) eh 
Z Eale 2 Veet os a 


inwhich [oo] = c+ a wo 


2 
The values of the sums of the squares 7 sila multi- 


plying the differential coefficients in equation (7) may be obtained 
from the normal equations fornted for the evaluation of An, ar, 
; AA, Let the aormal equations be: 
faa] AA, + fab] AAL+--+ [atjad, = [aM] 
[GaJar + Let] ara,+---+leeJar,= [en] 


(8) 


[ea] aa, + LebJaa, +: .-+[eeJon, + [en] 


If these equations are sclved for 1.1, , AA,,--:, 2, by the 
method of undetermined multi: «~s, the first is multiplied by an 
undetermined constant, ms the . 1d by a , etc., and the resulting 
products are added. The cone for the solution for A 7 , are: 
[ea], + [a- _++++ + [et] ey, = / 
(9) [ta], + [#i a t+ +[6l] we = O 
Léa| + + [ety tive + [CE] Oo. 
To solve for AA,, equations (8) are multiplied by 7 , 3° 
respectively, added, and the following conditions impose: 
[aa], + [06] 3 4... + [ae] 3, = 
(10) = LealB, + (68) 3, +--+ + [eel g. = 
[ee]q + ee]A +. +l] * 0. 
To solve for AA K? equations (8) are multiplied by w,,w, +-: 
Wi , respectively, added, and the following conditions imposed: 











110 THE STANDARD ERROR OF PARAMETERS 


[ao] w, + [ae] We tet [ae] W, =0 


(11) [ta] w, + [6b] w, +--+ BEI Ww. =0 


[ia] w, + ceé] M, Gees? ree | Ww = / 
It may be proved that: 


= Loa) ,=Ire) — [é6] 
an ERT Alea os: [ea 


ae Loé] (3.-[ré] “* l€€). 


The method of deriving equations (12) is indicated in the well- 
known text on the method of least squares by Merriman (1907) 
in which a detailed proof of the fact that 6, isequalto [tT] is 
presented. The other relations may be derived in analogous fash- 
ion. It may be observed that [or]=[1e] , ete. 

The required quantities to be substituted in equation (7) 
may, therefore, be calculated by solving the sets of simultaneous 
equations, (9), (10), and (11). 

This completes the solution of the general problem presented 
in the first part of this paper. Some confusion may arise in re- 
gard to the proper application of the methods described above if 
one or both of the functions, y and 2, happens to be in a linear 
form with respect to the parameters. It may be shown that the 
formulas given will hold in any of these special cases. Although 
Taylor’s theorem may be applied to such functions, such a treat- 
ment is superfluous. If either or both of the functions, y and 2, 
is linear with respect to the parameters, the expression for a 
is identical with equation (7) even though the linear function, or 
functions, was not first expanded by Taylor’s theorem. Further- 
more, if y is linear with respect to the parameters, the values of 
the coefficients. [or], etc., in equation (7) will be the same re- 
gardless of whether the parameters were evaluated directly or 


whether Yy was first expanded. The latter statement is evident from 


WALTER A. HENDRICKS 111 


an inspection of equations (9), (10), and (11) and a considera- 
tion of the law of formation of normal equations. 

As an example of the application of the methods presented 
in this paper to a specific problem, consider a set of data given 
by Spillman (1933) relating to the yields of potatoes obtained 
from four plots of ground which had been treated with different 
amounts of potash. 


Yuetps oF Potatoes From Four Ptots or Grounp RECEIVING 


DIFFERENT AMOUNTS OF PoTASH 


x (Units of K,O) y (Bushels of potatoes) 





When a simple exponential equation of the form, 
(13) y= A-Be** 


was fitted to this set of data, the most probable values of the 
parameters, A ,G@, and «, were found to be as follows: 


A = 432.801 + 11.637 
B = 341.393 + 11.406 
K 0.6195918 + 0.0462871 . 


The value of the product of the parameters, A and xk, hap- 
pens to be of some interest, at least to the author of this paper, 
since it gives the value of the first derivative of ¥ with respect 
to x at the point where the curve crosses the z-axis. In the pres- 
ent example it represents the increase in yield, per unit increase 
in amount of potash applied, which would be possible if certain 
inhibiting influences, which seem to be proportional to the yield, 
had no effect. For the particular data under consideration, the 
value of this product is 268.160. 











112 THE STANDARD ERROR OF PARAMETERS 


In order to calculate the standard error of this value, equa- 
tion (7) was applied as follows: 


(14) * = («fee] + A186] +24 %[08]) SS, 


from which the standard error of Ak was found to be equal to 
+ 13.331. 

The familiar formula for calculating the standard error of 
the product of two independent quantities, when employed for 
the purpose of calculating the standard error of Ax may be writ- 
ten in the form: 


as)  S= (x*foe]+ ALES]) S,, 


Equation (15) gives a value of + 21.291 for the standard error 
of Ak, which deviates considerably from the correct value given 
by equation (14). The discrepancy is due entirely to the fact that 
the estimated values of A and x are not independent. 


REFERENCES 


MERRIMAN, MANSFIELD. 1907. The Method of Least Squares. John 
Wiley & Sons, New York. 

ScHuttz, Henry. 1930. The standard error of a forecast from a curve. 
Jour. Amer. Stat. Assuc., 25 (N.S. 17): 139-185. 

Spittman, W. J. 1933. Use of the exponential yicld curve in fertilizer 
experiments. U. S. 1). A. Tech. Bull. 348, 

























TRANSFORMATION OF NON-NORMAL FRE- 
QUENCY DISTRIBUTIONS INTO NORMAL 
DISTRIBUTIONS* 





By Nie 


G. A. BAKER 


This investigation is undertaken for two reasons: (1) there | | 
has been a demand on the part of some statisticians for an analytic 
method of transforming non-normal distributions into normal a 
distributions; and, (2) a non-normal distribution and the trans- 
formation necessary to transform it into a normal distribution 1 
serve to specify the distributions in random samples of estimates t 
of the parameters of the original distribution in terms of the 
distributions of estimates of the parameters of a normal distribu- ae 
tion in random samples. In this way valuable approximations to 4h 
the distributions of the parameters of the original non-normal 
population may be secured. 


PART I 
TRANSFORMATIONS OF FREQUENCY 1 
DISTRIBUTIONS Ae 

Consider a non-normal frequency distribution represented by 


f(x) dx where the origin is taken at some central point, say the af 


mode, mean, or median, or near one of these points. and the scale 


is, Or approximately is, the standard deviation of the distribution. ‘e 

We seek a function, Ff . such that f(xydx transformed by the 12 

transformation , x= P(x), ' 

becomes a normal distribution of total area ¥27 , meati zero and i 

standard deviation unity, i.e. i 
i 





* Presented at the May, 1932 meeting of the Illinois section of the 
American Mathematical Association. 











114 NON-NORMAL FREQUENCY DISTRIBUTIONS 


2 


(1) flea) p(w) du=e yy 


nI~ 


In a previous paper’ expressions similar to (1) were regarded as 
differential equations which can be solved exactly in certain special 
cases. In the case of (1) it seems preferable to regard 

1” 


(2) Ff LPC)] @u)=e * 


as an identity in uw. If it is assumed that f and g are functions 

that can be represented by Maclaurin’s expansions, which is a 

reasonable assumption regarding f and @ if f is near normal, 

the two members of (2) can be expanded and the coefficients of 

corresponding powers of u can be equated thus determining @ . 
Suppose that 








2 
4 = ” 
(4) gf (x) c Bi x 
5 fc, a ays ‘ail 
(5) Sf C+ Z2nB-x 
Then (2) becomes go - WH 6 ae 
= 1 - 
oO FA aT Fn gu" 
* 4 ‘ 8 " 
iu“ “a uu 
I-22 24 746 F468 246810. ° 
Hence 
és 3 ai 
fil _-A8B -4 _4app-4& 
Bex, BAB, Be -B-4a'e- Ae 


1 Transformations of Bimodal Distributions, Annals of Mathematical 
Statistics, Vol. I, No. 4, Nov. 1930. 





G. A. BAKER 


¥ 





= 
A,B 


2 





A, B B, 


. 


EF, 











Zz 





B, = -A, (BB? 


66, ) 


a 
é 


+ hER+EE E¢ 


7 


“B 


(G 





) 


°B 


5 ¥ 
-A,(8B,+ 4G BBrZeG 














116 NON-NORMAL FREQUENCY DISTRIBUTIONS 


The corresponding formulas for determining a function to 
transform a normal distribution of total area ¥277, mean at zero 
and standard deviation of unity, into a given non-normal distri- 
bution are as follows. (The As are the coefficients in the expan- 
sion of the given non-normal distribution and the Bs are the 
coefficients of the transforming function.) 


? 


B-A,, B-A, B-&+3 


These formulas give very simple results for the expression 
of the first few terms of the transforming function, ¢ , in terms 
of the coefficients of the given function. If the coefficients in the 
expansion of ¥ rapidly approach zero so that only a few terms 
are needed for a good approximation the method outlined should 



















G. A. BAKER 117 


be effective. Edgeworth has discussed at some length the trans- 
formation or “translation” of normal distributions into non-nor- i 
mal distributions and has given several methods of determining ik 
the coefficients of the transforming function. The formulas pre- | i 
sented here are more simple but their practicability can be demon- : 

strated only by numerical results in special cases. For practical ‘| 
purposes the left-hand member of (6) need represent the right- | i 
hand member accurately only in the interval, say -2 <u éz 


ILLUSTRATION 


For example, consider i, 
997° «=~-10X 
fixy=.9929 (14%) -e  , | 
which is skewed noticeably in the positive direction but which is He 
of a type that approaches a normal distribution as the skewness 
approaches zero. Then 


A,= 9929 &= 10072 if 
A, =—.100 B= 0511 | 
A, = — 4887 6,= 0050 li 
A,= 0823 B,=— .0080 | 
A,= 1142 B= 0004 
A, =—.0279 iF 


* Bowley, A. L.-F. Y. Edgeworth’s Contributions to Mathematical 
Statistics, pp. 65-78. 











118 NON-NORMAL FREQUENCY DISTRIBUTIONS 
TABLE I 


Comparison of the ordinates of the normal function, function 
with skewness of .2, and the skewed function transformed by the 
transformation X = /,0072 u +.05/1U"+ .0050u7—.0080 u" 








u Normal Function |Transformed| Normal | Normal minus 
curve* | with Skew .2¢| skewcurvet| minus transformed 
skew curve| skew curve 





.053991 .049243 . 

.078950 .076810 , 0120 
110921 112956 0118 
.149727 .157043 0218 
194186 .206951 0157 
.241971 .259120 ; 0085 
289692 .308958 0058 
333225 .351538 0033 
.368270 . 382453 ! .0093 
.391043 .398583 0004 
.398942 ,.398859 0000 
.391043 383157 0005 
. 368270 354545 .0005 
333225 .316273 0033 
. 289692 .272360 .0055 
.241971 .226714 0097 
.194186 182641 0139 
.149727 .142563 .0178 
110921 ..107939 .0202 
.078950 .079354 .0073 
.053991 .056702 . 0088 


Neer e em OOSSSSSSOS ree ren 
SCHODRNORMRALPNONFPABONFADS 


* These columns were taken from Luis R. Salvosa’s tables, Annals of 
Mathematical Statistics, Vol. I, No. 2, p. 64 et seq. 


- ¢ These values were calculated by interpolating in the above mentioned 
tables. 


The ordinates of the normal curve, fo), and fox) trans- 
formed by the transformation determined by the first four Bs 
are compared in Table I. 

The ordinates of the transformed distribution are much 
nearer those of the normal curve over an interval that includes 
seventy-five per cent of the frequency but for the rest of the 
range considered the agreement is not so good. These facts indi- 








G. A. BAKER 119 





cate that more terms of the transforming function must be taken 
in order to secure close results for large values of |u| . 

It is difficult to set up a rigorous criterion as to the number 
of Bs necessary to define adequately the transforming function, 
but the following considerations are of value in this connection. 

Suppose that fcx) may be adequately represented in the 
interval a¢x< & by 77 terms, ice. 

$x) = A, + Ax + Ay xi+---+A,, x 
and that 772 is large enough so that the first m terms of the ex- 


? 


pansion of the normal function give an adequate representation 
of it. This is clearly possible since the expansions with which 
we are dealing converge uniformly in the open interval. Then the 
first 7 4&5 may be determined so that the first 22 terms of 
fix) dx transformed by the transforming function determined 
by the 7 /3s are identical with the first 72 terms in the 
expansion of the normal function. In addition there will remain 
certain terms which may cause a serious discrepancy. For fejdx 


becomes 
A (B+2Bur--)+ A, (Gur But--MB+26u+---) + 


; vad 
tos oe (BurGur-)(G +2Gut---). 


Let us assume that all Bs =O, &>m, and investigate the 


_termsin UW ~— of degree higher than m . 


Since the first terms of {(<)dx transformed contribute few 
terms involving 4, )m, and the higher order terms have 
small coefficients, it is to be expected that if 77 Gs are used a 
good result will be obtained, at least for moderate values of uw . 

Some skewed distributions that differ considerably from 
normal may yield a rapidly converging sequence of Bs , that is in 
case there is a natural relation of this kind existing between the non- 
normal and normal distributions. 

The main reason for investigating the possibility of an easily 
determined transforming function that will transform a non- 















120 NON-NORMAL FREQUENCY DISTRIBUTIONS 


normal distribution into a normal distribution is the fact that the 
distributions in random samples of estimates of the parameters 
of the non-normal distribution can be expressed in terms of the 
transformation and the sampling distributions of the parameters 
of the normal distribution into which the non-normal distribution 
is transformed. This proposition is developed in Part II. 


PART II 


DISTRIBUTION OF THE ESTIMATES OF THE 
PARAMETERS OF NON-NORMAL 
DISTRIBUTIONS 


Suppose that a variable x is distributed as f6 x)adx where 
fc) is such that it can be transformed into a normal distribution 
by means of a quadratic transformation, 

(1) X= ayt $y * 
Then f)dx becomes 


(2) flayt ty’ Harzty)dy , 
where y is normally distributed. 


The total of a sample of x xs drawn at random from fo) is 
(3) (4, % + It---+ X,), 


which by virtue of (1) becomes 


(4) aly ryt -+y,) ttt yr---+y). 
The coefficient of a in (4) is an estimate of the total of a sample 
of z of a normally distributed variable and the coefficient of & is 
an estimate of the second moment about a fixed point of a nor- 
mally distributed variable which can be written as an estimate 
of the standard deviation squared plus the estimate of the mean 
squared. Thus (4) can be written as 


wa Sas 
namrntelF + mm), 























G. A. BAKER 121 


where the bar over 7 and o denotes estimates of these para- a] 
meters by means of samples. The distributions of ™ and @ are 


. nm 
taken to be zero, then ™ is distributed as proportional to e #7 


known and are independent. If the mean of distribution (2) is, i 
and ysamirem is distributed as proportional to 

t 

1 


‘= nlatr2bytaVa'ruby ) 
& 2 
Va +4Ey ; 


The distribution of 6%? is, except for a constant factor, 


(5) € 





(6) ¢ e@ os Bio 


if 72. 

If two variables, x» and y ,are distributed as F&%4) dxdy , 
then the probability that a value of @(x,y)=v 
is indv is given as the surface area of the cylinder F (%,4)=v be- 
tween 2=f(%y4) and Z#=0O times du 

In this case the probability function of VaytzZ is pro- 


a’+2byt+aVa+4b 
Vv a ahtrstpeets Sal n-3 - 2G 


¢ vy d 
2 Var+uty ,” 


Put JY =axttx and (7) becomes 


portional to 





-atVa*syey 
2 
~~ f 77-3 
at We x Zz 2 
(8) € e” - [v-ax-6x*] dx, Zgvgoo. 
—-a-Var+-4hu" 


2é 





* Baker, G. A.--Random Sampling from Non-Homogeneous Popula- 
tions, Metron, Vol. VII, No. 3, Feb. 1930. 











122 NON-NORMAL FREQUENCY DISTRIBUTIONS 


If fx) can be transformed into a normal distribution by 
means of a cubic transformation. 


(9) x= ay tty +cy” 

then (3) becomes 

(10) alyryrnry,)+ bly ty sry) rely igtenry) 
which can be written as 72 (amit bmrcmrscTmrbr) 


Hence the means of samples of 7 are distributed as proportional 


to 
7-3 _mx n V-ax-6x -cx 
Jie -ax-bx ex - e 2 p | eae a eae 3CXtE ) 
3COX+E 
a1 


VGcxt+b) + (60°x74+66ex “42 b+ abr seu) 4 


C3¢x+6)* 


where ¥/4 represents the interval or intervals for which 


V-ax- €x~cx” and 3¢x+6 have the same signs and 
V varies from —°%o to +00. 
Suppose that the given frequency distribution can be trans- 
formed into a normal distribution by the transformation 
x= ayt ty, 
and consider the expression for the estimation of the standard 
deviation squared of, the x-distribution from a sample of 72, 


(12) (2G"+ tt ee) pee tet 
a we 


G. A. BAKER 


which becomes 


2 
(13) [ay +6y7)+-+ay, + €y')] [arta e~t0a Yn! €o) 
SSS ae as os a a 


where is normally distributed. In terms of the estimates of the 


? . 
mean and standard deviation of the gs (13) can be written as 


2.4 az . as 
2tF+a GF +40b6F M+d4bF Mm. 
Hence the estimates of the standard deviations of the original 
population will be distributed as proportional to 


JS ~F [ala rtotx)+ Vas yabxrghe +3 6 (ai-tabx)+ V@ivabxryee)is ev | 
e 


46* 


————————————— 


77-3 
" oe +" 
a +462) +V (asyabxt4bd) +5 by | 


ee 


(a+yabx+4b'x *4ab+at'x)) 


4E°V (a +4abx+4O') + av 
where v varies from O toto. 


x. 


——LL——_—_—EES eee 


+ aS eS Se 


The distributions of the estimates of other parameters and 
the distributions of the estimates of the mean and standard devi- 


ation for different transformations can be expressed in terms of 


the distributions of the mean and standard deviation of the re- 


SOM DEE 


VSS 


sulting transformed normal distribution but it is obvious that the 
process becomes complicated. 


es 


ak SPR POSES 











INVARIANTS AND COVARIANTS OF CERTAIN 
FREQUENCY CURVES 


by 


RicHMOND T. ZocH 


Introduction. After the most convenient type of equation 
y=f (4,4, b, c....) has been selected and the parameters a, b, 
c, ...., in the selected equation have been determined so that for 
a given set of values +, (t= 1, 2, ....m), the computed values 
y, (t=1, 2, ...m) agree as closely as possible or as closely as 
is consistent with the observed values Y. (t= 1, 2, ...m), it may 
be desirable to make one or more of the transformations: (1) 
move the origin, (2) use a different scale (unit of measure), 
(3) change the total frequency. 

This paper discusses certain invariants and covariants of the 
above transformations which were noted in developing the general 
theory for the Pearson Curves of frequency. 

1. Change of Origin. Instead of considering the diff. eq., 


(1) dy _ _ gO P) 

dx ~ bo x*+&x+& 
which is the diff. eq. from which the Pearson curves are derived, 
we take the more general diff. eq., 


(2) St “ y (x-P) 


6 x"+b ae ee & x + 6 
Equation (1) is a special case of equation (2). 
Make the following substitutions : 


x= X+P,, €.- 5, 
71 Fé. + ‘. = B > 

3) n(n-1) p*, 7 
( = P to +(n-1) PE . S-2 . ns ? 








RICHMOND T. ZOCH 125 


and on simplifying we obtain, 


o de. yX : yX 
(4) GL BX + BX + BX, FOX) 


lf we now write 


(5) x 4 Pe cal 


we have: 
dy = (ake v9 
2 B_&-P)"+ B,,@-P) +: 4B (e-P)4+B 


The solutions of equations (4+) and (6) can be written in 

the form ; 

(7) y = G(X) = GG-P), 

where P is the mode as will be observed from the diff. eq. In 
other words the frequency function is a function of (x-F) 
when it is written in the form of eq. (7). Therefore if we change 
the origin of x by writing x=x-A all of the constants of the 
frequency curve will remain unchanged if at the same time P’ be 
subjected to the transformation P=P.h . 

2. Change of Total Frequency. Let C, be the constant of 
integration when the area under the curve is unity and when the 
argument is X= ~x-F; 4, the constant of integration when the 
argument is X = x-F for an arbitrary area under the curve; and 
NV the total frequency. Now whey the total frequency is changed 
the area under the curve is changed, hence from the above defini- 
tions 
(8) K, 7 WC, 3 
Therefore if the total frequency be /V and it is desired to write 
the equation of the frequency function for a total frequency of 
WV then write K, for A, where 


9) Kz (KM 
and leave all of the remaining constants unchanged. 
It should be emphasized that in leaving the remaining con- 


wera 
2g ER Sepenine eeRRNNIRR EE Ra SSRIS tes Lh > ee rn ae > Cates aebweseee: 












CASS 
= aS aS 


a = ES Ee aS SS 
a = — —- - 3 SS Se = 


POP 

































































































126 INVARIANTS AND COVARIANTS 


stants unchanged we assume that the distribution of the new 
sample or the universe obeys the same iaw as the old sample, 
Occasionally one sees the statement in works on probability and 
statistics in connection with the Theory of Errors that as the 
number of observations is increased indefinitely, the aritnmetic 
mean tends to the true value of a distribution. ‘his statement is 
based upon the tacit assumption that an observation less than the 
true value (most probable) is as likely to occur as an observation 
greater than the true value. If we make this assumption we will 
always (if the number of observations be sufficiently large) ulti- 
mately obtain a symmetrical frequency curve (the 4, M. coincides 
with the axis of symmetry) and this assumption contradicts the 
assumption that the distribution of the new sample obeys the same 
law as the old sample (except the old sample itself be symmetric- 
ally distributed). 

3. Change of Scale. We are now ready to consider the 
behavior of the constants when the unit of measure is changed. 
Perhaps it is well to point out here that quite often it is desirable 
to change the unit from months to years, from feet to yards, from 
pounds to grams, etc. The behavior of the constants under a 
change of scale is not as easily arrived at as for the changes of 
the origin and total frequency. 

The behavior of B, where &,, is the coefficient of the high- 
est power of X in /(X) of the differential equation, £2. - aX : 
will first be obtained. F(X) 

Elderton' uses moments to determine the constants of a fre- 
quency curve. Thorkelsson? and Fisher® have used Thiele’s semi- 


1W. Palin Elderton, “Frequency Curves and Correlation”, Second 
Edition 1927, London- 
2 Thorkell Thorkelsson, “Frequency Curves Determined by Semi-In- 
variants” (Visindafelag Islendinga IX) Reykjavik Rikisprentsmidjan Gu- 
tenberg—MCM XXXI. 
3 Arne Fisher, “Frequency Curves”, Transiated by E. A. Vigfusson, 
American Fdition, 1922, The Macmillan Co. 





RICHMOND T. ZOCH 127 


invariants for this purpose. Semi-invariants have an advantage 
over moments in that the values of the higher semi-invariants do 
not change when the origin is changed. Moreover Fisher (pp. 
12-16, loc. cit.) has pointed out how the semi-invariants behave 


when the unit is changed, viz: 


A, (axte)= AA(x)+HC 
A, (ax+c)= a® X. (x) for i>/. 


Referring to equation (2) let be the value of P when 
the origin is at the arithmetic mean, and let t, t, a ice 
and & be the values of &. ; b:,.,, hah é , and & when the 
origin is at the arithmetic mean. Now Thorkelsson (loc. cit.) has 
pointed out that when his method is used for computing the con- 
stants of the curve there will be only one equation involving 
FE and only one equation involving &'. Moreover the coefficients 
of the (¢ )’s and the constant terms of the remaining equations 
will be of constant weight. 

Below is an example of the equations obtained when Thor- 
kelsson’s method is used to compute the constants: 


“e+ &'+3A,& =0 
A, + be +5 d, 6 +4, b= 0 
A,+ 21,8 + Wrg6 + (sd, t1ar-) fe! - 


yt BA E+E ee +(e A,r45),>,) b'=0 


Lg + 4A Or (ehr2s dre enh) e50van) bl 











128 INVARIANTS AND COVARIANTS 


Note that only the first of the above equations involves ( and 
only the second involves 6! ‘ 

Since the coefficients are of constant weight they are invari- 
ants* of index W where w is the weight of the coefficient when 
x is subjected to the transformation x’=ax+¢ 

Suppose that we now consider the general case where /(X) 
is of degree 7. Hence, in general, equations (10) will consist of 
77+ 2. equations in 7*2 unknowns; the unknowns being F , 
G, &’ ere & . Disregard the two equations which involve 7 
and 4%’ then there remain 72 equations in 72 unknowns. Observe 
that the weights of the coefficients of the 6: form an A.P. whether 
taken by rows or by columns. Also the weights of the constant 
terms form the same A.P. as the columns. 

We now state the 

Lemma: If all of the elements of a determinant are covari- 
ants and the weights (indices) of the elements of every row form 
an A.P. and of every column form an A.P. then when the deter- 
minant is expanded every term is of constant weight (index). 

Proof: Let the A.P. formed by the weights of the elements 


/ jae 
of the rowsbe W,, = a, +i-) 5d oe eee 
fet A ‘ « 4,2", 7. 


Then the weights of the elements can be displayed as follows: 


a, ars at2d at3d =. are) do 
a, a,+5 a,+#26 a7 35 wee a,+(7-1)¢ 
a, a,to a, 425 Q,¢3FO «-- Ae (m-1)d 
a a,+o a,t+20 Gy t35 «2 Arar) d 


(It should be emphasized that the above is not the determinant 
mentioned in the statement of the Lemma but the elements of the 


above array represent the weights of the elements of the determi- 


4L. E. Dickson. “Modern Algebraic Theories”, Benj. H. Sanborn & 
Co., 1926; Chicago. Chapter I. 











RICHMOND T. ZOCH 129 





nant). Now since by hypothesis the elements of every column 
have such weights that the weights form an A.P.then @,, @ 
Q,,..-@,, must forman A.P. Let this A.P.be a +¢/-) 5 

Making use of this notation the weights of the elements of the 


2? 


determinant can be displayed as follows: 


a, a,tro oes at Ors)d 
a,td arord of 6 a, tr (n-)d+ 
atiriS ards)F .-- a+ Cr-i)5+ (7-1) 


*h row and the ¢ * col- 


Hence the weight of the element in the c 
umn is @, + (¢-1) 0 +(¥-1) .. Along the principal diagonal of the 
determinant ¢=4 . Therefore when the determinant is expanded 
the weight cf the term consisting of the elements of the principal 


diagonal is the sum of the AP. W,= @ +(¢-1)(3 +35) or 


w 


Z ie = -[2a, + (72-1) (3+ >) | = W. 


Every term in the expansion is of weight W because each term 
consists of one element from each row and one element from each 
column and hence the weight is equal to the sum of two series, 
each being an A.P., plus the weight of the term in the upper left 
corner. 

THEOREM: If all of the coefficients and the “constant” terms 
of a system of 72 linear equations in 72 unknowns are covariants 
of such respective weights (indices) that the weights (indices) of 
the elements of every row of the matrix of the system of equa- 
tions form an A.P. and of the elements of every column of the 
augmented matrix form an A.P. then the solutions are covariants 
whose weights (indices) form an A.P. whose common difference 
is of the same magnitude but of opposite sign to the common differ- 
ence in the A.P. of the weights (indices) of the elements of the 
rows. 




































130 INVARIANTS AND COVARIANTS 


Proof: By Cramer’s rule the solutions are 


K, K, 
2. -_ where A =| and where 
c A - ‘ 

Ko os : Kan 


D, is the 2-rowed determinant obtained from A by replacing 
the elements of the ¢ ™ column by the “constant” terms of the 
system. Let the weight (index) of the element in the ie” cow 
and hs column of 4 be a+ (i-1) d+(f-') 5. Also let the 
A.P. formed by the weights (indices) of the elements of the a 
rowof A be W%,; =a,+-1)d , hence in particular the A.P. 
of the weights (indices) of the elements of the first row are 
a,+le- 1)5 . Further let the A.P. formed by the elements of the 
column of constant terms of the augmented matrix be w, ;= a+lé-1)6, 
By the lemma just established we see that when A. is expanded all 
of the terms of the expansion will be of the same weight (index) 
W . Hence A is of weight (index) WV . Also since the A.P. 
of the weights (indices) of the column of constant terms is 
Wee = @.+Cé-1J5 then the weight (index) of each term in the 
expansion of D. willbe -— [a,+(¢-1) d] + a, different from 
the weight (index) of each term in the expansion of VV . Hence 
the weight (index) of D. is W- [a +t- 1) 3 }+ a. Therefore 
the weight (index) of Z, is W-[a + (c-1)5|+%- W=a-ar(e-1)C) 
and the theorem is established. 

Applying the above Theorem and observing that J=/ in 
equations (10) we obtain the result : 

weight of 6 = 3-2+(n-1)61) = 2-7. 

Since B= €, we have the result that when x is subjected to the 
transformation x'= @x+c then B; = a"G.. Or in other 


words £3, isan invariant of index 2-72 under the transformation 
x'= QAxtc., 


Now we turn to the consideration of - . Here we have 









131 





RICHMOND T. ZOCH 






n+/ equations in »+/ unknowns and the augmented matrix 
has elements of the following weights (* means that an element 






is lacking) : 






P 


o 






me 










e (nes) (ntz) (n43) (+4) <- ~ & 2n (nt2) 





Now . is the quotient of two determinants formed from 
the above matrix and if these two determinants be expanded in 
terms of the minors of the first column we see that the weight 
of (Wrti)-W = 1. That is FP is of weight 1 regardless 
of the degree of F(X ). Therefore F is an invariant of index 1. 

Next considering 4%,’ the augmented matrix has the same 
elements as for / except that the first row is now: 


t, - z 3 fires se + wes 2 
Following the same procedure we see that the weight of &'- 


(W+2)-VV = 2. Therefore is an invariant of index 2. 

We can look upon equations (3) as a transformation. We 
can reverse this transformation by solving for the & in terms of 
the G, . Also, by moving the origin to the A.M. equations (3) 
may be written: 



















P4' +t 


(11) “2 ™- ’ "7-7 ‘.. o “?-t 7-2 














132 INVARIANTS AND COVARIANTS 


' f 
In equations (11) P, 6, . eT é, are the values of F , 


&, is ..-, 6& when the origin is at the A.M. 

Note that the right hand numbers of equations (11) are iso- 
baric and that G, is of weight 2; G of weight 1; 6B, of weight 0; 
and in general G. is of weight 2-¢. 


/ ’ 


Now let %= + » Z,= 


Ge, 

-, in general G; = ; hence 
g,;/. Therefore when the gs are computed we note that , is of 
weight 7 ; 4, is of weight 77-1 and in general g: is of weight 
(2-£)-(2-nm)-m-e . Since g. is the product of all the roots, 
%, the sum of the products taken ( 77-/) at a time and so on and 
So; is the sum of all the roots (due consideration being taken 
of the signs) it follows that all the roots of -(X)are invariants of 
index 1 under the transformation x = axtc. 

Now if equation (4) be solved in the form of equation (7) 
then it can be seen by actual substitution of the indices of 4 and 
the roots of -(X ) which are involved in the constants that the 


n- P)* 
exponents K and € of factors of the form (i- x ) and 


ss 


os 


nm 


z 


Zé arc tan x-Pt+As 
- “ 


— re invariants of index zero. The 
[i+ 


x-P+2,) )]*/ 
(AR) ] 
K 
factor (’- =f) occurs for a real root % of F(X) and the factor 


&é arc tan Stet 
é : 
wa ge — CONES for a pair of conjugate com- 
fis _ fee) 7 
A, 
plex roots of (x q) . The fact that the exponents K are invariants 


of index zero will be generalized for the case where complex 
roots do not occur and where no real root is repeated. 








RICHMOND T. ZOCH 133 
If complex roots do not occur the differential equation 
; XdX 1 fm - 
dy. ¥ X can be written 7 « 4. merce 
Tk FR) ¥ ° Ftx) 8 bea* ia, 


where in separating 
coefficients of like powers of X we obtain n equations in m un- 





into partial fractions and equating 


knowns:and since the roots are all of weight 1 the weights of the 
augmented matrix will be (the unknowns of the system are the 
™,; ) : 


”-) n-1 77-1 'i<— es we eS & @ 7-0 ean 
W-2Z2 7-2 : ae +s + + * © + 8 [yee oO 
oseeaeewt w© 6 © ew wo eo ee ee Owe 6 & © oo S&S @ @ es oe 
oO oO oO ‘ e + . . + . . oO +* 


Applying the Lemma we see that the 77, are all of the same 
weight (since J = 0). Expanding the determinant in the numer- 
ator by minors we see that the 77, are of weight 7-2. Since &, 
is of weight 2-77, we have Z&=k,- is of weight zero. There- 
fore the K; are invariants of index zero under the transformation 


x! - @&xzeec. 

We have now considered all of the constants of the curves 
except the constant of integration. Let the solutions be written in 
the form: y = C; G(X). 

Now it is possible to write G(X) in such a form that G(X ) 
is a covariant of index zero under the transformation X’=a X . 
In the case of real roots this is accomplished by dividing both the 
numerator and the denominator of each partial fraction by the 
root involved in the fraction before the integration is performed. 
Partial fractions which involve complex roots can be similarly 
treated. This is the way Pearson actually treated his Types I, II, 
III, IV, and VII curves although he did not deal with his Types 
V and VI curves in this manner. 

After we have our solution in the form which makes G{X) a 
covariant of index zero then if we write ¥ for a x the total 


tly 
i 
4 
| 


pi 3 


ee Se ee 


SS 











134 INVARIANTS AND COVARIANTS 


’ . / 
frequency between 7X and (77+:) X_ will be the same as the total 
frequency between 7aX and (7t1)q X . Therefore y is a co 
variant of index (—1). Hence C, is an invariant of index 
(—1). 

An example will now be given. Take the equation to which 
Elderton (loc. cit.) fits a Type I curve on pages 54-59. He has 
used a unit of 5 years. Suppose we wish to change to a unit of 
1 year. Then the constants @, and @, being the roots of F( X)are 
invariants of index 1 and are each multiplied by 5 and become 
9.98190 and 67.63640 respectively. Since 7, and 7, are invari- 
ants of index zero they remain unchanged and are as he gives 
them viz. .409833 and 2.776978. The constant of integration be- 
ing an invariant of index — 1 it is divided by 5 and becomes 29.892, 
The equation with a unit of 1 year becomes (See top of page 58): 


409833 . 2.776978 
. x 
Y = 28.892 j ' + 998190 } { ! ~ $753640 


Suppose that now we wish to move the origin to age 26.75942. Then 
the above equation becomes: 
409833 2.776978 


ow e 
x -26. 26. 
xy = 28.892 {pp X= 20-75082 : )-% pres 


Finally suppose we wish to change to a total frequency of 2000 
instead of 1000 as in the given sample. Then the equation be- 
comes : 


.409833 2.776978 


" a 
“ a ~ 2 
y= 59,784 Jy 4 %—26.75942 ; )-= aemme s 
9.98190 67.63640 


4. Conclusion; Benefits of this Information. If the diff. eq. 
(2) be written in the form (4) by means of the transformation 
(3) then the integration is more easily accomplished. That is to 
say: in general the solution in the form of eq. (7) is more readily 
obtained from (4) than some equivalent form of solution would 


be from (2). Thus a solution in the form of eq. (7) is not only 












RICHMOND T. ZOCH 135 





more easily obtained but also lends itself readily to a change of 
origin. 
Each type of Pearson’s Curves may be written in a number 





of ways. The numerical example given above shows the con- 





venience and advantage of writing a solution so that the origin is 
at the mode, GC x) is a covariant of index zero, Y a covariant of 











of index (—1) and the constant of integration an invariant of 
” index (—1). 

- Regardless of what form is selected for writing a solution 
- the solution will be a covariant and the constants will be invari- 
” ants, but not necessarily of the indices mentioned above. A 
; knowledge of these invariants will save much labor if it is desired 


to make a change in the unit of measure. i 

Similar laws of transformation can be worked out for (1) 
solutions. of the diff. eq. ae = LL) where both fx yand F(x) i 
are integral rational functions of x and (2) for the Gram-Char- | 






lier Types A and B series. In the first case we obtain the same 
result as outlined above for the simpler diff. eq. a4 . eal : that 
is the solution may be written in the form += ¢.G(X)where 
G{X) is a covariant of index zero, Y a covariant of index (— 1) 
and ©, is an invariant of index (—1). In the case of the Type 


A series the coefficient of each term is an invariant of index zero. 









George Washington University. 











QUADRATURE OF THE NORMAL CURVE 
By 


E. R. ENLow 


There are three formulas for the calculation of areas under 
the normal probability curve, only two of which seem to be 
generally recognized in American statistical circles. Herewith is 
presented an outline of the mathematical development of these 
three formulas and a determination of the bounds of practical 
utility of each. 


The well-known equation for the normal curve, 


x 
_ MV ~ 22 
I~ Var 
may be expanded into the series 
NV eA, it. -— 
J” 6am L" Set Fig) i (siz : 


(Ref. 3) 


by means of Maclaurin’s Theorem. (See any good calculus text- 


book.) (7) This expansion is readily accomplished by making 
the substitution 


t= 
so that 


and 2 


 - 
fie)= € 
The process of successive differentiation is quite lengthy, 
since every other term differentiated becomes zero and therefore 
2n terms in the Maclaurin series are required to produce n terms 
in the new series. Aftér the expansion has been carried to five 
or six terms, a regular law of formation becomes evident from 
inspection of the new series 


~% 2 af 6 ; & 
' ye < x 7 ee ere 
e * f-€ * aa © ~ 73 c + LA 





E. R. ENLOW 137 


viz. : 
’ 27-2 
nth term = Jy-1 
. - 
After making the reconversion f= 77> 
aw 


acer 


and substituting the value of € in the original equation for 


the normal curve, we have 


> € 
y- 2g )- Ge) £Ge)- 3 Gd 


(Ref. 1, 7) 


as previously indicated. This series is uniformly convergent and 
may therefore be integrated term by term. 

The area under all or any portion of the normal curve is 
calculated from the integral of the equation for the curve: 


i Ww - xe J 

| = 6 Var e€ xm 
To simplify the procedure, let x= 06-va-T. 
Then dx-=- 6va.dt 


é* ~¥ 
wv 7 —. 4 
Sy = Fig Jo 0% 08 = fe fe ae 
ae 
. 4 j< dt (when Wv-r). 


The value of the definite integral representing the area between 
the ordinate at the mean and the general ordinate whose abscissa 
is ¢ is re < 2. 


fy-é fo" 
ISD - iw JO at. 


wil 
Then, integrating the expanded series for e (above) term 
by term, we have: 


> < - 
ie [l-£- B+ 8- Jat - 


© sz 
= f,.2* i Ne Be nk J 
il 2* 4h 61s” $f!) pls” 12Le- 


i} , 
it 
i 
\ 
i 
/ 
Fi 


/ 























138 QUADRATURE OF THE NORMAL CURVE 


Substituting in this series the value of t = P= and keep- 
. . 2 
ing the expression 2 separate, we obtain: 








z . 4 g 
fa- : Sy (=) , (7) _() ,@). 
2 Van ) +3 25-12 227-13 2°9-LY 


This series may be extended indefinitely, since the general term 


is seen to be (xX )’ 4-2 


TO op. lant )ln-s 
It will be referred to sai as Series A. 

A published statement that Series A is divergent when t>/ 
is erroneous (7). It is always convergent, regardless of the value 
of the deviate, but converges very slowly when € is not small, 
in which case it is better to use another series obtained by inte- 
grating by parts. (1, 7) 

We may write 


hen = fl-# Jl-2t€ ‘det J 
fl-2)L4*)) 
[4] [dv] 
fem = —s ay f+ ie 
(Formula) [Sedu] = - [J Vv du | 


Integrating the integral expression in in the last term above, i.e., ' 


Then 





in like manner (by parts), another term appears, and the equation 
above becomes 


f e* i t 
_ + 2 _ e 
e @f «a as ors + Zs Ss At - 


2ét 


E. R. ENLOW 139 


Continuing this process by breaking up the integral on the 
right into another term in the series plus a new integral, repeated- 
ly, produces the infinite series: 


~ z.. 


J =. 
Ze 2.5e 3-5:7€ 


ame = 


16t” 3279 


AZ _ WBS 4 W305 | 
"avy ep ey J. 


co °o za 
— a 
Je a ~ Je ale 


t 


a 


tes vr” 


} 2 


t , = ad LL 
we Ur walle 
Je dt =. + -Se AL. 


And since the value of the definite integral 


wD _Fo5 
fe er is a > ie “Qey Qe) “a 
2 


Then 


Therefore 


re 


ut 


fo%, ” a all Pe ee eee 
ft = * a oo 7° Gor 1, 


and, since 2 
ts je ot L. J. = 
o Se = dx = un € dt > 


(when W=/ and #- az ) 


* Good proof in The Encyclopedia Brittanica, American Edition, 1896 
in article on Infinitesimal Calculus. Also (2). 








QUADRATURE OF THE NORMAL CURVE 
























t ae 
de [ene - $f - €. fig 
Ur } oh £ Ur - 
. at. ~ ez. Ws: 
(2t*) (2t*) Qt)” 
Substituting 2 for t and keeping 2 separate, this series 
becomes : 





This series will be referred to as Series B. It is asymptotic 
or semi-convergent, (5) (8), a type of series which is frequently 
obtained by integrating by parts (6). Series B is divergent for 
values of 2 below unity. Weld (7) states that this series 
converges rapidly when 2) ¥Z , but does not mention its peculiar 
asymptotic nature whereby it converges until a minimum term 
is reached and then diverges. As Townsend (6) indicates, the 
best approximation of the sum of an asymptotic series is obtained 
if the series is terminated with the term having the smallest 
absolute value. This minimum term is the second term for 
ee V2 and while the error due to dropping the succeeding 
terms is less than the last term retained, still this second term 
has too large a value to permit any very accurate calculation of 
the area (as will be shown later). However, the accuracy in- 
creases as takes on larger values, since it then takes longer 
for convergence to the minimum term and this minimum term 
also grows smaller. 

Brunt (1) advocates the use of another series, developed 


by Schlémilch, which he states is better when is large. This 


E. R. ENLOW 141 


Schlémilch series, hereinafter referred to as Series S, is as fol- 
- x 
lows, in terms of * 


/ / _ 
f  (B)yrz f(rz} {Eye} 


5 9 
aefaryarg ” I eyeiepsr Tas} 
129 315 * 


~ F)+25 ++ $C * Y+10§ id i z)+2} cece SCH} 


Series S is readily developed from series B by transformation 
of successive terms in the B series to terms with the characteristic 
Schlémilch denominator. This is more easily accomplished when 
series B is in the “t’ (= saz) form. 

To determine the limits of practical utility of each of these 
three series, actual calculations of areas were made at appropriate 
= intervals, with results shown in Table 1 and Figure 1. All 
calculations were made “by hand” and carried to 10 or more 
decimal places. 

The three series (formulas) were used in the following 
forms : 

SERIES A:— z 4 G 
Area }* = me (348 942 250 3)f 1- (S}.,. Ce) _ () 
J T 6 40 336 


" si a ~~ ed 1G 
(GQ G4 ,@ __@_ ,_ @ 
3456 AZ2ZHO 599 cH#0 967600 175 ¥726Y0 


(3)" (xy (=) 


a 


a _. eee — pe: 
353 007 E640 78C 3237 15290 IE§ O24 OF 4 720 | 


* This term not given by Brunt (1), but calculated by present writer. 
Last term practicable to use, since next term also has plus sign. 















142 QUADRATURE OF THE NORMAL CURVE 


SERIES B:—, — 
= 5 x 
Area | = > in fog [o ~ } feg T — 


oO 


(=) : — 
a (.434 294 45) G) +.399 OFF 93 25}f/ (#) t 


a . 3. ¢ 102. ~ 995 4 10395 _ 135135, 
ey” CH) {> (#) -(®y? Gy" 


2027025 BHA 59425 , 654 72FV OTS 137 49F 105 75, | 
1G 1s OL 
(=) (¥) Cs)" Ce) 


SERIES S:—, 
Area |" t - tog fo- fost (3) (434 294 9819) 













/ 
q f - (ee 
rarer ast at} + Topas 
5 9 
CSS aE SSS hf Ceyeh ” PCr eat Serres 


IZAYG 3415 


{C$) +24 con 5 ($+ 105 ” Mevea}- PEI : 


Table 1 shows the numbers of terms required in each series 
to give areas accurate to 3, 4, 5, 7 and 9 decimz! places, respec- 
tively, for values of & ranging from .25 to 5.00. Calculations 
were checked by Sheppard’s Tables (4) and the accuarcy was 
determined on the principle that the error is less than the last 
term retained or the first term dropped *. Where x is used it 
indicates that the series cannot give the accuracy indicated. 


The graph, Figure 1, shows the approximate domain of 





* This does not hold strictly true of Series S, since it is a modification 
of the true Series B. 


E. R. ENLOW 143 


utility of each series under conditions of accuracy ranging from 
3 to 9 decimal places. Note that Series S covers a wider range 
than does Series B, including the entire domain of Series B. 
Hence we may conclude that, while it is essential as a basis for 
the derivation of Series S, Series B may be discarded for area 
calculations. Moreover, Series S is not only more valuable than 
Series B because of its wider range of utility but also because 
its more rapid convergence gives a desired degree of accuracy 
with fewer terms than are necessary with Series B. 


FIGURE 1 
DomAINS oF PrActTICAL UTILITY OF THREE INFINITE SERIES IN CALCULAT- 
ING AREA UNpER NorMAL PrRoBABILITY CuRVE. 


9 
4 
° 
° 
< 
a 
i 
° 
@ 
a 
H 
© 
| 


Value of x/¢ 


As an illustration of the use of this graph (Figure 1), we 
note that for five-decimal accuracy Series A must be used for all 
values of “= up to 2.50, and that Series S may be used for 
+ = 2.50 and all larger values. Table 1 shows that the number of 


a 
terms required in Series A for five-decimal accuracy increases 
from 3 at % = .25 to approximately 14 at 2 = 2.25; while, 
beginning at % = 2.50, Series S requires but 4+ terms, and this 


number diminishes to 1 term for =. = 5.00. 











144 QUADRATURE OF THE NORMAL CURVE 


TABLE 1. 
NUMBERS OF TERMS REQUIRED FOR VARYING DEGREES OF ACCURACY IN 
CALCULATION OF INCREASING PRroporTIONS OF AREA UNDER 
THE NorMAl PROBABILITY CURVE. 


[3 aecimats [4 decimals] 5 desimats 7 dgcimals [9 decimals 
$22 o BS Cen pao: 
x1 x St kt ¥ Si «xi XxX Aix} x 1 % 































25 2 x 
50 3 x x} x x! x 6 | *1 x 
75 4 x Kx} x A} ¥ 8} x] * 
1.00 4 K x) Xx x| X x 9| x! x 
1.25 51x x | Kx] x ¥| xX KOT i ei * 
1.50 7)xI|xf} 8|x| x x| x x fiz} x]. 
1.75 7| x15 ¥| ¥ x} x x }14*] Kl y 
2.00 Sixis K| 4 x} * ¥ 1 16*| x] x 
2.25 x13 x) 3 x| * x |} 18*) x | x 
2.50 2\2 x| 2 «| 4 % |] 20*| x] 
2.75 i be x) 4 xX 23*| x] x 
3.00 1} 1 3} 2 x| 3 4 \|27*| x| x 
3.50 1}1 1} 1 i 4 ~ |x*]7 
4.00 £41 21% 1 a — |x 16 
5.00 za 1}1 1 1 - i3 iz 


3 decimals |/4 decimals 5 decimals || 7 decimals ||9 decimals 





* Estimated by graphic extrapolation. 

Explanatory: Read table as follows: The number of terms required 
in Series A to calculate to 4 decimal places of accuracy the portion of the 
area under the normal curve lying between the ordinates at => 0 and 
= = 2.00 is 10; with Series S it is 4; the calculation is impossible to 
this degree of accuracy with Series B. 

Notes: X indicates impossible calculation. — indicates impracti- 
cable calculation. 


- CONCLUSIONS 
All areas under the normal curve may be calculated by the 
use of Series A and Series S, the two being complementary. 
Methods of developing Series A and Series B are outlined 
and it is indicated that Series S is derived from Series B. 





E. R. ENLOW 145 


The domain of practical utility for each series is shown in 


Figure 1. The numbers of terms required for various degrees 


of accuracy are shown in Table 1. 


SELECTED REFERENCES 


. Brunt, David, The Combination of Observations. Cambridge University 
Press, 1917. 

. Elderton, W.P., Frequency Curves and Correlation. Layton, London, 
1927. 

. Kelley, T.L., Statistical Method. Macmillan, 1923. 

. Pearson, K., Tables for Statisticians and Biometricians. Part I, Second 
Edition (1924). Cambr. U. Press. 

. Rietz, H.L., Editor, Handbook of Mathematical Statistics. Houghton 
Mifflin Co., 1924, 

. Townsend, E.J., Functions of Real Variables. Holt, 1928. 

. Weld, L.D., Theory of Errors and Least Squares. Macmillan, 1916. 

. Wilson, E.B., Advanced Calculus. Ginn and Co., 1912. 











EDITORIAL: A. L. O’TOOLE 


ON A BEST VALUE OF R IN SAMPLES OF R 
FROM A FINITE POPULATION OF N, 


In recent years the problem of finding the moment coeff- 
cients for samples of 2 drawn from a finite population of 72 has 
been of interest to so many writers! that it seems worthwhile to 
make a few further observations? concerning these moment co- 
efficients—particularly with respect to their dependence on 2. In 
many instances the value of % to be used is at the discretion of 
the investigator and he would like to know if there is one value 


of 2 which is better than any other. An answer to that question 
will be given here. 


1H. C. Carver, On the fundamentals of the theory of sampling, An- 
nals of Mathematical Statistics, Vol. I, No. 1, pp. 101-121; Vol. I, No. 3, 
pp. 260-274. 

C. C. Craig, An application of Thiele’s semi-invariants to the sampling 
problem, Metron, Vol. Vil, No. 4, 1928, pp. 3-74. 

R. A. Fisher, Moments and product moments of sampling distribu- 
tions, Proc. London Math. Soc., Series 2, xxix, 1929, pp. 309-321; xxx, 
1929, pp. 199-238. 

L. Isserlis, On a formula for the product moment coefficients of any 
order of a normal frequency distribution in any number of variables. Bio- 
metrika, xii, 1918-19, pp. 134-139. 

P. R. Rider, Moments of moments, Proc. of the National Academy 
of Sciences, Vol. 15, 1929, pp. 430-434. 

H. E. Soper, Sampling moments of samples of m units each drawn 
from an unchanging sampled population, from the point of view of semi- 
invariants, Journal of the Royal Statistical Soc., Vol. 93, 1930, pp. 104-114. 

A. A. Tchouproff, Qn the mathematical expectation of the moments of - 
frequency distributions, Biometrika, xii, 1918-19, pp. 140-169 and 184-210; 
xiii, 1920-21, pp. 283-295. 

A. L. O’Toole, On symmetric functions and symmetric functions of 
symmetric functions, Annals of Mathematical Statistics, Vol. II, No. 2, May 
1931, pp. 102-149. See Chapter ITI. 

2 These observations arose as a result of some very far-reaching sug- 


gestions on the theory of sampling made by Professor Carver, during recent 
conversations with the writer. 





A. L. O’TOOLE 147 


The differential operator method developed by this writer® 
for finding the moment coefficients not only was a very simple 
method but had the added advantage of leading directly to some 
theorems whose generality had not been established previously. 

Using the notation of the previous paper let the finite parent 
population of w be composed of the w variates x,, %, X,, 

~+- From this population draw all of the ,C, different 
samples and let #4 = Zx ; at, Z b, ..++. ghey CM 
Sx designates the sum of the 2 values of x which appear in 
the é** sample. With this notation it has been shown in the paper 
cited that 


c_J_K r J K 
(l) s 71> Cee te Sisx hee 
t-% A 


Ci1)™ (1) (41) (TIKI) 


where S, 3° 


The summation in (1) is to be taken over terms such that 

Titdgr KR + +--+ =t where [,J,K,...,4,4,%, .... 

are positive integers, and where /~ is obtained from the mth 
sampling polynomial FG) by replacing the exponents of the 
polynomial by corresponding subscripts. 


wie * 
} 


(2) Pe) aa Yl (-1) (xn ae - 


i=o ? 











148 SAMPLES FROM A FINITE POPULATION 


° ; JO Pe 
In particular Pe a i= 2 -#, I= 6-344, +2 72, 


= f- 1p t lap, bp, 

~ 

Ie = f,- 5 Bt 50H CCfyt 2p, 

Ro = fp - 3 pt (180 p,- 399 p+ 3b%R- I2Cp, 

Ky = fp - 63 @ + 602 p,-2100 pf, +33 p-252% fot 720 pe, 
where 2 = wn Ory, K <2. 


It must be kept in mind that the multiplication of these oper- 
ators is symbolic. For example, to find F FP first multiply the 
polynomials F C#) and FE by ordinary multiplication and 
then the result when the exponents in this product are replaced 
by corresponding subscripts is os . 

Since in this paper it is desired to consider moments rather 
than power sums, replace 5,., by GC) /4,,,and $4 bY Wp... i 
Then (1) becomes, after dividing by , C 


nA? 
zx a IT+J+K Ir . 2 K 
wy wt -tl s BER wee 
we aon «) G7 .. TN Th K! 


Now Pp = 1-4 Dit ; REA, hence, 


di aol ast 
wl, W(n-')(~w-2):-* (w-Rtt) 


Substituting this value for each @/¢, in (4) the result is 


Egvatrors $ 
/ 
A4,.3 - 2 A4,.. 
“eL i ‘ 
Mae * [lr Mt), + (w-r) 41,,,. | 
‘ Aa 2 . 
AL5.5 * pealr-a) E A-WA-2) Mi, + 3w (a-1)(w-2) At. a 


+ (w-2)(w-2n) 43... | 





A. L. O’TOOLE 


af 


Mes” ees, A NOAM A-2) Mins 
+ ow (r-Wlr-2)lv-r) ih Meas 
t+4nw (r-r) (v-72)(w-2nt 44" Me 
t aw (n-1) (w-A)(w-a-1) Pa 
+ (n-r~) (Wp 6Wwr ronrrn) a., | 
etc. 
Now let VW: a2. Then 


€gva tons G. 


f 


AL _ 


rh 1) a. + (a-t) Lt;, ad 
[2 Cr-Nlr-a)ib) + 3a. A-Wa-t) Ab, Bary 
+ (a-i)(a-2) AL, | 


anr-| 


cae au-2) 


a [ arn (2-1)r-2)(1-3) Le. 


(ar-'(anr-2)ar-3) 


+a Cr-1lr-2)(a-1)ul), 


ty4alr-a- Ji2@ -2)4 1} My, it 4 
+ 3a (2-1) (a-1) Sr la-)-"} 4x 
t(a-r(a=eare}tay sty, | 








150 SAMPLES FROM A FINITE POPULATION 


A partial check at this point is to note that for @ = 1 only the first 
term of each of these moment coefficients remains. 
Let a@=2. Then the above moment coefficients become 








4 ‘ 
Ans : & ee. 
Athy = a Cr- ut, tt, an 

5 ZA- a 

/ ad ’ 
eg, = a [2G ajeti +346) 4c | 


/ 


(7) My:2 Speers [42 (2-2}r- 3) se! "pia nbr2jur ut. 


‘ 
t 4b, %, at alr1)al.. - wr) 


f / 
nonlin: A-3K A-4 ML. x fron fa- a)! a 
Ae.” ar a pot one 


t . ’ 

is zou! as * 150r-1) ae at 
’ ‘ 

= IM. t,x | 


etc. 

It is observed that when @ =2, i.e. when 77 = 2z, the mo- 
ment coefficient Maas is independent of the moment coefficient 
he - Also 4,', is independent of Ay. x - But one must 
not assume that all the odd moment coefficients of z are indepen- 
dent of the corresponding odd moment coefficients of x . For Mh. > 
is not independent of AL). as is seen by evaluating FE which 
is the coefficient of <4. in the expression for <4). , 

So far the moments considered have been the moments of x 
with respect to the origin from which x is measured and the mo- 
ments of 2 with respect to the origin from which # is measured. 
Consider now the moments of ~ about the mean value of x and 
the moments of # about the mean value of # . That is let 


ee . a es 
- = 2.-%, C= 42,3 °°: 


A, L. O’TOOLE 151 


t Then 
Ait 
—* % 4 ML, =Z2x i since /1,-rl, by (5), 
Al Ait 
= 2 (x-M,)= Z=. 
eg. Z = 2Z- /t, ee x, —-2M, 


= (x,-/1,) + Cx- A) + O- ++ +00, ) 


= x, + x + 
a: 


zx. 


a, Gee 


Hence it is clear that Z is the same function of X as 2 is of x. In 


other words “,,. —(the moment of 2 about the mean of 2)— 


? is the same function of 24.., “4.., —., ----—(the moments 
ax ’ ’ ’ 

of x about the mean of x)—as <<), was of 46. , “4,. ,.+++- ° 
i. There is one important simplification however due to the fact that 


4,., = O and hence all terms which involve 2z,,,, vanish. With 


this in mind (6) becomes 


z 
a (a-1) 
Ma. = @ i. . ™ coms ‘ 
iz ? 2% aoe a:xX 9 


A455 = 2° (a-1Ma-2) | AL 
(ar -1 lan-2) 

a, 2 RON 12 @-1)-1] > 

OWE” 2:% 

(8) it pear arene ae : 

+ A (a2 7a7+122-6)+2 a(a-1) 
Gar-1)lar-2)lar-3) 14; 

. 10a (a-1))(a-2)(r-1) (an-r-') 

“lea Nae-ae-ties ay) On “en 


3.7% 


FZ 


p 22@-Ma-2)laln lar t2R+5a) yy. 
(Ar-1)(AN-2 War-3) (Qr-4 ) 
etc. 
Here again it is noticed that for @ = 2, ie. for wm =2,, ,, 
is independent of <4,., . In other words the skewness of the 
distribution of 2 is independent of the skewness of the parent 











152 SAMPLES FROM A FINITE POPULATION 


population of x. Similarly <,,, is independent* of <2,.. and 
also independent of ;,, . But since © is not zero for a = 2, 
4,,, is not independent of ~,, . 

Now consider the variance of z , 


2G) 


2S 
anrA- 25 


ete 


_ 
= fa AL3:% (since W= a2). 
Obviously it would be very desirable to have the variance (squared 
standard deviation) a minimum. Since the variance is a function 


of 2 differentiate 42,., with respect tor. 


& Ahy-3 = War “2a:x 


w2% 4 =Oand hence w=2r 


2 minimum 
To make 2,., a “> 4. 


or, that is, @= 2. 
, 


‘ 5 aati (i € 
When a =2, he* 3 Migs. 4 o Wan =: 


In conclusion it may be said that there would seem to be 
good reason to suggest that, when possible, the investigator ar- 
range to have twice as many variates in the control group or par- 
ent population as in each of the samples to be analyzed. Taking 
AL= & will insure that the skewness of the samples will be 
independent of the skewness of the parent population and also 
that the fifth moment of the samples will be independent of the 
fifth moment of the sampled population. In addition, taking 
A> & will cause the variance (squared standard deviation) 
of the samples to be a minimum. Choosing ~ = # presumes, 
of course, that W is an even number. But in most instances it should 
be possible to arrange_that ” be even. For if an odd number of 
observations are given either another observation may be added or 
one of the given observations deleted to make j» even. 


4 E vanishes with FE because c = Ba-—12B). But RB is not 
a factor of i , 





EDITORIAL: H. C. CARVER 


PUNCHED CARD SYSTEMS AND STATISTICS 


Because of the increasingly important part being played by 
mechanical devices in statistical methodology, it seems desirable 
to call attention in the Annals to some of the possibilities of 
punched-card systems. 

The standard punched-card, illustrated below, is seven and 
one-half inches by three and one-quarter inches in size. To a 
certain extent the operation of a punched-card system is analogous 
to that of the Teletype machines used in wiring messages. In 
the latter case telegrams are written on a special typewriter which 
translates the message into a series of electrical impulses that 
in turn operate a distant typewriter which prints the message on 
a strip of paper; in the former, cards are automatically fed 
through a special typewriter that both prints the words or numbers 
on each card and also translates the information into properly 
punched holes in the card. The data on these cards may be 


totaled if desired by running the cards through a tabulating 


Fic. 1 
GEORGE S BROWN ENG 32 0275 128 640 168 342 260 343 202 
| | | an 
ye ua t 
CoodcevooRecceMeccccncccccceMecccccccceMoncccoccnseMeocccceReccecssecceccessesese 
UDUENSENAOT ONTOS O TTT TOT ETT O OMT TS TTT OMU Ged asaya ee esas aaeaeaeaoedeeegaoanogg 
2222222220202.22222222222022220222202222222222228208222222282822222222222222222222 
33339339333393339333333303399339993939393933328993393382893333939993329393933993339 
SAUL OAAAGA AAA GAGA AAA AG AGG G GGG AaB aa aaa aa Baaaaaaehaaasaaagagagagaaaaaaagaas 
SSSUSSSHSSSSSSSHSSSMMSSSSSSSSSSHSSSHSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS555S 
sebecccecesMMeccsccsccccsccccececceMecscMsccscecMesceccetsccsecssccescessucces 


SUCOOCCCRRR CCE RE RAR RR REA RARER REE AEE RRR EO Meee eeeseeeeeeeeseeeeeeceeeeceaeesees 
SsseabecsgesBssseggssaggugsseggagsssgegsssaggggsssaggssssssesssgeasssessassasses 
is Urernsi 


Se esenvneseneeaneensarne ARQAMNBBT eRe 
2089 











154 PUNCHED CARD SYSTEMS 


machine at a rate exceeding one per second; the total of the 
squares of the numbers appearing on consecutive cards may be 
obtained automatically; if the variates x and y be punched in 
respective columns on each card, the total of the xy products for 
all the cards is likewise made available; the cards may be arranged 
in order of magnitude according to card number, date, or variates 
at a rate exceeding six per second, and finally the data on the 
cards may be printed on a scroll—the cards passing through the 
listing or printing machine at about 80 per minute. 

In order to provide an actual problem to serve as an illus- 
tration, I secured the anthropometric records of one thousand 
first year male students who entered the University of Michigan 
in the fall of 1928. The 1,000 cards, of which Figure 1 is a sample, 
were punched by an operator of average ability in slightly less 
than three hours. The data for the students were selected at 
random and the cards, as punched, were numbered consecutively 
from 1 to 1,000,—the card selected for Figure 1 being the 275th. 
The weight to the nearest pound of this individual was 125 
pounds, and the height, width of shoulders, and the circumferences 
of chest, waist, hips and right thigh were, respectively, 64.0, 168, 
34.2, 26.0, 34.3 and 20.2,—linear measurements being made to 
the nearest one-tenth inch. 

These 1,COO cards may now be placed in a tabulating and 
listing machine which can total all seven of the data fields simul 
taneously. If desired, the machine will also print all of the infor 
mation of each card on a scroll together with the totals. The first 
and last parts of this scroll are reproduced below photographically. 
—the names of the individuals being omitted purposely. [ecause 
of both the number of columns involved and the magnitudes of 
the totals, the listed totals unfortunately run together. The ver- 
tical lines, inserted with a pen, facilitate the reading of the totals. 
The cards are totaled and listed simultaneously at the art: of 89 
per minute. 


H. C. CARVER 


COSIAMSsUDY 
1 be ee be be et 
HRUAAUUARH 
AAGAWUSAOMS 
AAARAAAAAH 
SOTOKOOII® 
Ur OUneoIE 
Pee ee eee 
GAIAIAABDIAD 
AAIHOTAGAY 
VUlUuUuuuUl 
DIRCUUGAABY 
@NuUIAnoWnU 
Ww WD 
aOC@MIBACAG 
SOounmIaAaae 
wu 
VASBRASSORDA 
Or auoaunan 
OKO HH 
SSenrsono~e 
Cwanoveacou 


a 


Cowwwwwwowrs 
Cwwwwwwwww 
CO@IAMAUOH 
1 ee ee 
WRRUUDEWUEAD 
URPNOUSCOoOUW 
AAIADADAARH 
SCOP AOWOROU 
ACSIVOBSASO 
PRP PP PR PR 
BWIAAKHIAAAYW 
AAADAAOVOO 
Wu 
SAIRUBDATUTAD 
MWOUorunuNood 
DUBNwWwHwNnn 
@SOsIATTCON 
cova touraU 
VU 
AnIWAUAIAS 
woouuunoaan 
DNYDYKNONKOY 
CorooCwCr Or. 
AYP OP UNORD 


» 


| 

| 

siatiderasnleenciiencipembeenndeunee 
' 


Fig, 2 
An investigation of the correlation that may exist between 


height and weight will involve the numerical value of 
1000 


a x, Ge 
where x; and y, designate the height and weight, respectively, 
of the é* individual. The plugboard of an Automatic Mul- 
tiplying Punch may be wired in a few seconds so that 
(a) the data of columns 34, 35 and 36 of Figure 1 will 
feed into the multiplier of the punch, 
(b) the data of columns 38, 39 and 40 will feed into the 
multiplicand, and then 
(c) the product, xy, for any card run through the machine 
will appear on the product summary counter. As the 
cards pass through the machine, the total of the prod- 
ucts is accumulated on this counter. 
If desired, each product may be punched automatically in the 
card, provided of course the card contains a sufficiently large 
number of otherwise vacant columns. The maximum number of 
digits in current models that may occur in either multiplier or 
multiplicand is eight. The number of digits in the multiplicand 
does not affect the speed of the multiplication; for three or less 
digits in the multiplier the cards feed through the machine at 
the rate of three seconds per card,—for eight digits in the mul- 








156 PUNCHED CARD SYSTEMS 





tiplier the speed is twelve cards per minute. One may therefore 
place our cards in the machine, press a button, resume other 
duties, and some fifty minutes later the 1,000 cards will have 
yielded the total 
Zxy = 9 Y¥77 433.6 
To obtain the sum of the squares of the variates in question 
it is necessary only to double-wire the machine,—one wire going 


to the multiplier and the other to the multiplicand. We obtain 
then 


Fx = 4 GIS BIZNZ Zy*= 19 69Z 452 

By permitting the machine to punch each value of xin the 
card, we may treat x" as the multiplicand and 2< as the mul- 
tiplier and then obtain the sum of the cubes of the variates; or 
by double-wiring x” obtain the sum of the fourth powers of the 
variates. If, while accumulating the cubes of the variates, we let 
the machine also punch each cube in the card, we may then obtain 
the sum of the powers of the variates up to and including that 
of the sixth order, etc. We are limited, of course, by the fact 
that the card contains eighty columns. 

3y running the punched cards through a sorting machine, 
we may obtain very readily the frequency distribution of the 
weights, and also the corresponding median, quartiles, etc. To 
accomplish this the cards must be run through a sorting machine 
three times, first sorting to column 35 of Figure 1, then to column 
34 and finally to column 33. The cards pass through the sorting 
machine at the rate of 400 per minute, so that in approximately 
eight minutes—including time spent in handling the cards between 
sorts—these 1,000 cards will be perfectly arranged according to 
magnitude in weight. If the numbers with respect to which the 
sort is to be made-contain w digits, the cards must be run 
through the sorter “ times. We reproduce on the following page 
a photograph of the first part of a printed scroll obtained by 


running the cards through the listing machine after they had 
been sorted according to weight. 


TON OCR ORO eee OIE SORIA TAD OLOOMROONVOMEABANAMNHO doe COOK AANAMeNANoAR 
Ted hh ete OS OADOESODAR ONO EOOOOMOOOOO OOOO POD ORaronnr nse 
“ietet et etet et et eet el el wt wt et et et eh ot wh ob et ot et el viet el vt vt el wll ol et et el elel el wl wll elelel el etoteleteled del det etetebotein rine 


MONON MM-ODAOONNTHOOUENOONOTHOTN LALA MAOMMNOMONHONHROOMOOADMNBOLASENMONCOS 
SCHONO AM AHONEN RAMA A TMAMAMMAANINRMAMMMANACMMAMOMMMOAM SS 


HAMANN ANANANAAAMAAANAAMAAAAMAMNAAAAAAAMAAAAMAAAMAMAAAAAMAMAMARA moe 


Hip 


Dee Oe OO TOR Ne Oe ROE TOE NDONONLYVOODENOMONDENONAROANODURAONONO@ANNONE 
REO TUTE COTE ONONE MM OOM ONT ONOONT ONT ORT OOCOT NT ORUR SD Doe oe 
LLLSAAALALAALALAALAKARALAKAAARAKKALALAAAALAAAARALARTLAKARARTAR Er rowen 


Wars}. 


OO NO Aho ae OOO SLMOMUEANATOMMOMNDOOOAHONHOONDONS TOnKONedNONOvoTdoWnuS 
AAANAAARAAA LO AAACAAMNAAAMNAOMAMEANAAEAR RAT AAI MMA CEMwaaRranne on noe 
NANA AMAA AAR ANA AMAA AAA AAA AMAAMMAAAMMAMMAMMAMAMMAMMGAR Emote 


Ches¥ 


NIT e SCONCE ORARANANAD OOM THOM TARO OMOHIMOANNOFHMAr COMO YANEACMNAeY 
Mellel tt ee oOo BONO ON TONY ONO THNNNOTNNENNOOTNNHODNHTONDOANO EIN Oe 
MAI AM AMAA ANd MMMeeel eee WeleictleleeH cetetdeieteeite ee 


Sheuldeg 


ge 
i} 
N 
x 
v 
GS 
rt 


Hsvapt 


Weight 


A-OANOMNACDONWWOKOMY 
DAD EL MRO MMOD YL CANAN ST OTENOOOT TOO IMME CE OnoMOe ome 
N aHOD ane TOMOMMNCHNHNDOAADHHRMNSTENEADA AANEAAHR naMAnNNnNnA 











158 PUNCHED CARD SYSTEMS 


A rough notion of the functional dependence that exists 
between weight and the other six variables recorded on the cards 
may be obtained by permitting the machine to total these ordered- 
with-respect-to-weight cards in consecutive groups of 100. That 
is, we obtain the averages for numerically equal groups selected 
according to the weight-deciles. The six regression lines may 
therefore be plotted, approximately, from the following results: 


TABLE 1. 
ANTHROPOMETRIC AVERAGES BASED ON WEIGHT DECILES. 














Inter-decile 


[ Sh’der \ Chest | Waist | Hips 


Range 

























































First 112.98 } 65.133 | 15.576 | 32.607 

Second 122.41 | 66.659 | 15.980 | 33.663 18.926 
Third 127.85 | 67.087 | 16.161 | 34.421 19.206 
Fourth 131.98 | 67.381 | 16.334 | 34.816 19.554 
Fifth 135.62 | 67.937 | 16.406 | 34.860 19.893 
Sixth 139.54 | 68.189 | 16.651 | 35.608 20.112 
Seventh 143.87 |} 68.576 | 16.789 | 35.766 20.438 
Eighth 149.43 | 68.895 | 16.807 | 36.116 20.712 
Ninth 156.01 } 69.185 | 17.022 | 36.788 21.444 
Tenth 173.19 | 69.854 | 17.611 } 38.596 





If we had arranged the cards numerically with respect to 
height, instead of weight, we would have obtained the following 
results : 


TABLE 2. 
ANTHROPOMETRIC AVERAGES BASED ON HEIGHT DECILES 


Inter-decile | Weight | Height | Sh’der | Chest.| Waist | Hips | Rt.Th. 
Range 


First 123.95 | 63.339 | 16.036 | 34.201 | 27.371 | 34.217] 19.592 
Second 130.97 | 65.295 | 16.261 | 34.856 | 27.984 | 34.944] 19.929 
Third 133.75 | (66.367 | 16.282 | 34.787 | 27.731 | 35.152] 19.926: 
Fourth 136.28 | 67.021 | 16.570 | 35.429 | 28.329 | 35.302] 20.084 
Fifth 139.81-| 67.623 | 16.587 | 35.379 | 28.206 | 35.499] 20.223 
Sixth 140.60 | 68.189 | 16.532 | 35.510 | 28.184 | 35.560} 20.008 


Seventh 142.65 | 68.806 | 16.659 | 35.550 | 28.380 | 35.821] 20.315 
Eighth 143.44 | 69.498 | 16.638 | 35.328 }| 28.065 | 35.858} 20.205 
Ninth 145.71 | 70.412 | 16.776 | 35.778 | 28.236 | 35.960] 29.068 
Tenth 155.72 | 72.346 | 16.996 | 36.423 | 29.024 | 36.852] 20.746 


ee 








H. C. CARVER 


159 


A comparison of tables 1 and 2 reveals clearly that the right 
thigh measurements are more highly correlated with weight than 
with height and this phenomena appears also to be true—though 
possibly less pronounced—for the shoulder, chest, waist and hip 


measurements. 


By sorting the cards according to the last recorded digit of 
each weight, the tendency of observers to state results as mul- 


tiples of two and five is apparent. 


TABLE 3. 
RELATIVE FREQUENCY OF FINAL 
Dicits IN OBSERVED VARIATES. 


Final Digit Frequency 


0 
1 
2 
3 
4 
5 
6 
7 
8 
9 


OSA a 
FI NAA 


1 | VN A AR aA a 
pa pp YNZ AAA 
Ls 


The results presented in table 
3 indicate that final digits of 1, 
3, 7 and 9 are decidedly unpop- 
ular and this casts a reflection 
upon the accuracy of the re- 
corded measurements. This 
type of bias is well known—in 
fact, census and mortality staa- 
tistics usually present the same 
phenomena. Figure 4, which 
follows, clearly illustrates this 


pt Bet a 

C2 ee OY SYK AA oH Lt kd 
LE. Tete 

Ro pp 


FIT IVAL 
id 


PEE NS ee 
Han BOE 














QUADRATURE OF THE NORMAL CURVE 


SUMMARY 

The punching, sorting and listing machines provide a most 
economical and accurate method of recording and analyzing, non- 
mathematically, observational data. The punched-card system is 
especially effective in constructing frequency distributions and 
correlation tables when the data are very numerous. 

The recent development of the Automatic Multiplying Punch 
is unquestionably the most important contribution to the mechanics 
of mathematical statistics since the invention of adding and mul- 
tiplying machines. By enabling one to compute moments and pro- 
duct-moments exactly and without the grouping of variates about 
class-marks, corrections such as those due to Sheppard are un- 
necessary for practical computations. Indeed, it is even possible 
to evaluate linear functions of one or more variables with this 
machine, and subsequently print the graduated values on a scroll 
—these results being “rounded off” to any desired number of 
decimal places. 

In this article I have described only a few of the various 
machines employed in statistical and accounting practice. Readers 
*may secure additional information from the International Business 
Machines Corporation, Tabulating Machine Division, 270 Broad- 
way, New York City. 


BAC. 6-17-34, 











CONTENTS 
BIOMETRIKA. Vol. XXVI, PARTS I and II 








I. On the P- Test for Randomness: Remarks, Further II- 
lustration, and Table for P-. By FLoreNnce N. Davin 









On the Corrections for the Moment Coefficients of Fre- 
quency Distributions when the Start of the Frequency is 
one of the Characteristics to be determined. By E. S. 
MartTIN. With eleven figures in the Text . ‘ : ‘ 12—58 















On Asymptotic Formulae for the Hypergeometric Series. 
Part II. By O. L. Davies. With six figures in the Text 59—107 






IV. Die Statistik der Seltenen Ereignisse. Von RotF Lupers 1i108—128 






V. On Certain Non-normal Symmetrical Frequency Distribu- . 
tions. By G. H. HANSMANN. With twenty-five diagrams 
in the Text 129—195 













A Biometric Study of the “Flatness” of the Facial Skele- 
ton in Man. By T. L. Woo and G. M. Morant. With 
four figures in the Text . j ’ : : ; : . 196—250 







MISCELLANEA: 


(i) 






On certain Measures of Dependence between Statistical 
Variables. By Professor J. F. STEFFENSEN . «.  « 251—255 










Remarks on Professor Steffensen’s Measure of Contin- 
gency. By Kart PEARSON «© «© « « « «§ ae 










Spearman’s General Factor Again. By Professor BurToN 
H. Camp 







260—261 





Note on the Recurrence Formulae for the Moments of the 
Point Binomial. By A. A. KRISHNASWAMI AYYANGAR . 262—264 










A Note on the Incomplete Moments of the Hypergeo- 
metrical Series. By A. A. KRISHNASWAMI AYYANGAR . 264—265 









On Simometers and their Handling. By Kart PEARSON 265—268 






Edited by 
Karl Pearson, University College, London, W.C. 1. 





